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PREFACE 


With  the  development  of  advanced  Army  Ballistic  Missile  Defense  systems 
there  arises  the  requirement  for  increasingly  sophisticated  guidance  and 
control  techniques  and  systems.  Fundamental  to  this  area  is  the  competitive 
situation  wherein  a  target  vehicle  is  attempting  to  avoid  an  intercept 
vehicle  or  to  say  the  same  thing  another  way  an  interceptor  is  attempting 
to  hit:  an  evasively  maneuvering  target.  This  competitive  situation  is 
referred  to  in  the  technical  literature  as  the  differential  game  problem. 
Results  developed  for  this  fundamental  problem  area  obviously  have 
applicability  to  a  wide  variety  of  situations  not  only  military  strategic 
but  other  competitive  situations  as  well.  This  report  is  one  of  a  companion 
set  of  reports  issued  on  this  broad  research  effort  and  it  deals  with 
continuous  time  differential  games  in  a  stochastic  environment.  One  of 
its  purposes  is  to  develop  techniques  which  can  result  in  the  simplest 
possible  thoroughly  effective  systems. 


ABSTRACT 


An  examination  la  made  of  some  problems  encountered  in  the  optimal 
control  of  a  linear  dynamic  system  by  two  independent  controllers  with 
noisy  state  observations,  the  controllers  having  either  conflicting  or 
concurring  objectives.  The  question  of  what  form  the  optimal  controls 
should  take  is  also  discussed.  By  restricting  consideration  to  linear 
forms,  it  is  shown  that  the  computational  complexity  of  a  general 
optimal  linear  strategy  is  considerable.  Attention  is  further 
restricted  to  a  particular  linear  form  for  the  optimal  controls:  a 
matrix  transformation  of  a  vector  which  is  the  solution  of  a  linear 
differential  equation  forced  by  the  observations.  Properties  of 
certain  form6  of  this  type  of  control  are  analyzed,  and  it  is  shown 
that  the  parameters  of  these  forms  may  be  expressed  in  terms  of  solu¬ 
tions  to  a  set  of  nonlinear  differential  equations  with  split  boundary 
conditions.  It  is  also  demonstrated  that  these  forms  reduce,  in  a 
one input  case,  to  those  specified  by  the  separation  principle  of 
one-sided  optimal  control. 
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Chapter  1 
INTRODUCTION 


A  wealth  of  practical  problems  arise  out  of  natural  engineering 
situations  in  which  the  control  of  the  "system"  is  in  the  hands  of  more 
than  a  single  controller.  Such  multiple  controllers  may  have  varying 
objectives,  and  these  objectives  may  be  wholly  or  partially  conflicting 
or  concurring.  As  examples  ve  might  cite  pursuit  and  evasion  situa¬ 
tions  with  two  vehicles,  rendezvous  in  space  of  two  vehicles,  and 
control  of  an  International  economic  system  by  several  state  govern¬ 
ments.  In  many  of  these  natural  engineering  situations  the  controllers 
are  required  to  act  with  Imperfect  information  as  to  the  true  state  of 
the  system;  thus,  in  such  cases  the  question  of  how  to  control  in  a 
manner  which  is  in  some  sense  Optimal  is  usually  difficult  to  answer. 

For  this  reason  control  theorists  have  often  chosen  to  analyze  abstract 
m  ■ 

mathematical  models  which  are  thought  to  retain  some  Important  charac¬ 
teristics  of  their  real-world  counterparts,  since  such  models  yield 
more  readily  to  analysis  than  the  actual  situations. 


It  is  the  object  of  this  research  to  investigate  the  nature  of  a 
specific  type  of  two-input  control  problem:  one  in  which  the  con¬ 
trollers  have  conflicting  objectives,  the  state  of  the  system  is 
described  by  a  system  of  linear  differential  equations,  the  criterion 
functional  is  quadratic,  anJ  the  controllers  have  avallr.ble  only  sta+' 
observations  which  are  obscured  by  white  Gaussian  noise.  This  is  a 
stochastic  differential  game  situation.  It  is  thought  that  a  thorough 
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analysis  of  this  problem  may  reveal  sane  interesting  facts  which  will 
contribute  to  a  greater  understanding  of  more  complicated  problems. 

In  the  realm  of  optimal  control  theory,  systems  which  are 
described  by  linear  differential  equations  and  quadratic  cost  func¬ 
tionals  have  become  classic  objects  of  analysis.  This  is  partially 
because  they  have  the  pedagogical  advantage  of  yielding  with  relative 
ease  solutions  which  illustrate  theoretical  principles  in  a  simple 
framework.  It  is  also  because  many  real-world  optimal  control  prob¬ 
lems  can  be  fitted  into  the  linear -quadratic  mathematical  framework; 
hence  we  gain  insight  into  the  behavior  of  practical  systems  by 
studying  the  linear -quadratic  models.  * 

In  the  area  of  stochastic  optimal  control  similar  statements 
apply.  Here  the  so-called  "separation  theorem"  [9,15]  enables  us  to 
combine  our  knowledge  of  the  deterministic  optima?  control  for  linear- 
quadratic  systems  with  the  results  of  Kalman  and  Bucy  [17,lS]  in  the 
area  of  estimation  and  prediction  of  the  state  of  stochastic  dynamic 
systems  to  produce  a  control  which  is  stochastically  optimal  in  the 
sense  that  it  minimizes  the  expected  value  of  the  cost  functional. 
Specifically,  the  theory  of  deterministic  optimal  control  when  applied 
to  linear -quadratic  systems  shows  that  the  optimal  control  function 
can  be  expressed  as  linear  state  feedback;  i.e.,  if  we  denote  the 
control  signal  by  U(t)  and  the  state  of  the  system  by  X(t),  then 

I  U(t)opt  -  K(t)X(t) 


where  K(t)  is  a  feedback  gain  matrix  determined  by  the  parameters  of 
the  system.  The  separation  theorem  then  shows  that  when  the  state  is 
not  directly  observable  the  stochastically  optimal  control  signal  is 

"W.toc.  opt  *  . 

A,  . 

where  X(t)  is  the  conditional  mean  of  the  state,  based  on  all  available 
knowledge  and  measurements,  and  K(t)  is  the  deterministically  optimal 
feedback  gain.  This  result  is  Intuitively  satisfying  in  that  we  simply 
use  the  best  (mean  square)  estimate  of  the  state  in  place  of  the  actual 
value  of  the  state  to  obtain  the  best  realizable  control  function. 

Differential  games  are  natural  objects  for  the  application  of 
optimal  control  theory,  since  in  many  cases  the  formulations  of  these 
problems  are  only  slight  modifications  of  ordinary  optimal  control 
problems  with  provisions  for  an  extra  control  input  to  the  plant. 
Indeed,  differential  games  described  by  linear  differential  equations 
and  quadratic  payoff  functionals  yield  under  mild  restrictions  solu¬ 
tions  which  are  not  greatly  different  in  nature  from  those  of  the 
analogous  one-sided  optimal  control  problems.  To  be  specific,  the 
optimal  strategies  for  both  players  are  linear  state  feedback  control 
functions. 

A  natural  conjecture  then  is  that  in  the  stochastic  version  of 
the  linear -quadratic  differential  game,  where  the  players  are  unable 
to  observe  the  state  directly,  the  stochastically  optimal  strategy 
would  be  to  e  ’ploy  the  conditional  mean  of  the  state  in  place  of  the 
sta*«  - 1  t  >.  linear  feedback.  Unfortunately,  this  conjecture  is  false. 
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as  Is  easily  shown  by  simple  counterexamples.  We  shall,  therefore, 
proceed  to  inquire  about  the  nature  of  the  optimal  strategies  and  to 
analyze  in  particular  the  special  cases  where  the  controllers  are 
restricted  to  the  use  of  computationally  feasible  (practical) 
strategies. 

To  begin  the  development,  we  shall  formulate  the  stochastic 
differential  game  problem  which  will  be  the  underlying  object  of  analy¬ 
sis  for  the  remainder  of  this  work. 

1.2  The  Deterministic  Game  Formulation 

The  differential  game  described  by  the  system  equation 

X(t)  -  ?(t)X(t)  -  O^tJU^t)  *  G2(t)U2(t)  (1.1) 

(where  X(0)  =  X0,  e{xJ  -  XQ,  and  CovfxJ  -  *XJ 

with  payoff  functional 


J(UX,U2)  .  |  B^CTte^T)  +  Ui(t)<i1(t)U1(t)dt 


r 

■  J  u2(t)02(t)u2(t)dt 


(1.2) 


(where  Q^{t)^  Qg(t),  and  are  positive  definite  symmetric  matrices, 
sad  the  asterisk  denotes  vector  or  matrix  transpose] 


and  observation  equations 


the  system  equation  becomes 


x'(t)  =  Q^F(t)Q^x'(t)  -  Q3^01(t)Q“1(t)U^(t) 

+  Q3^C2(t)Q^(t)U2(t) 

Or,  defining  new  matrices 

p'(t)  -  Q3b(t)Q^ 

G^(t)  -  Q^G^tjQ-^t) 

Gg(t)  -  Q3^G2(t)Q"^(t) 
the  system  equation  becomes 

x'(t)  =  F’(t)x'(t)  -  G^(t)U^(t)  +  G2(t)U2(t) 


the  payoff  functional  becomes 


rT  . 

J(U^U2)  =  |e{x'(t)*x'(t)  +  J  u'*(t)u’(t)dt 

rT 

-  J  Ug* (t)Ug(t)dt| 


and  the  observation  equations  are 


(1.7) 


(1.8) 

(1.9) 

(1.10) 

(1.1A.) 


(1.2/0 


zx(t) 

z2(t) 


^(t^x'tt)  +  \{t)  £ 

H2(t)Q3^x'(t)  +  Tlg(t)  J 


H^(t)x'(t)  +  ^(t) 

Hg(t)X*(t)  +  Tlo(t) 


(1.3A) 


In  view  of  the  possibility  of  making  these  transformations,  we  may 
consider  (l.JA),  (1.2A),  and  (1.3*0  to  be  a  general  problem  formula¬ 
tion.  Further  generalization  is  possible,  however. 


Note  that  the  solution  to  (1.1A)  may  be  written  (dropping  the 
"prime"  subscripts)  as 

X(t)  =  (j)(t)X0  -  /  ♦(t)*"1(T)01(T)U1(T)ar 
O 

ft  < 

+  J  4>(  t)4>"1(T  )g2(t  )u2(t  )af 


(l.i] 


If  we  now  define  the  integral  operators  and  Tg  by* 

(T^Kt)  »  /  ^(t)<t>"1(T)G1(T)U1(T)dT 


(1.12 


(T2U2)(t) 


♦(^♦“^TjO^TjU^Tjdr 


(1.13 


then  equation  (l.ll)  may  be  written  as 


X(t)  -  <>(t)X0  -  (TjWjKt)  +  (T2U2)(t) 


(1.1- 


X(T)  *  ^>(T)X0  -  (T^U^)  (T)  +  (T2U2)(T)  (l»15 

Note  that  $(T)X0  Is  the  predicted  miss  distance  under  the  condition  of 
no  control  being  applied  by  either  player. 


We  shall  henceforth  drop  the  argument  T  whenever  t  =  T,  so  it  will 
be  understood  that  when  we  write 

x(t)  »  <j«o  -  T^y1  +  t2u2  (1.16) 

this  is  equivalent  to  (1.15),  and  when  the  argument  t  is  intended,  we 
shall  use  the  form  (1.14) • 

The  first  term  in  the  payoff  functional  (1.24)  may  thus  be  written 

as 


X(T)*X(T)  -  (<J>X0  -  +  T2U2,  <|>X0  -  +  T^)  (1.17) 


(where  (•,•)  here  denotes  the  inner  product  in  Euclidean  space.)  The 
other  terms  in  the  payoff  functional  may  similarly  be  expressed  as 
inner  products  in  the  Hilbert  spaces  formed  as  finite  copies  of  the 
space  L  (T).  Hence,  the  payoff  functional  may  be  written  as 


J(ux,u2) 


I  *{(K‘  *lV  T2U2>  t1U1  *  T2U2) 
*  -  <V>s)} 


(1.18) 


We  wish  to  find  the  and  Ug  which  minimaximize  j(U^,Ug).  We 
require  that  these  be  functions  of  only  the  observables  Z^  and  Zg  and 
the  known  statistics  of  XQ,  T|^,  T|g,  with  the  specific  functional  forms 
to  be  determined. 


< 
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To  acquire  some  insight  Into  the  nature  of  the  problem,  ve  first 
solve  the  deterministic  version;^"  i.e.,  ve  assume  both  players  know 
precisely  the  initial  condition  XQ.  Ve  also  assume  both  players  are 
able  to  monitor  the  state  continuously  during  the  progress  of  the 
game.  We  may  drop  the  expected  value  notation  for  the  time  being,  and 
thus  express  the  payoff  as 

-  r«^o-Tiui*T2ua> 

♦  («!•«].)  -  (v°s)}  (1-19) 

To  mlnimaxlmlze  this  quantity  with  respect  to  and*U2,  ve  form  the 
functional  derivative  of  j(U^,Ug)  vith  respect  to  and  U0  and  set 
these  derivatives  equal  to  zero.  Thus 

-  ui  -  T^x0  -  T!T2U2  +  T1T1U1  “  0 

g~  J(«1U2)  -  -U2  +  Tg(|)X0  -  T^T^  +  T2U2  -  0  (1.21) 

(where  the  asterisk  here  denotes  the  adjoint  operator.)  We  see  that 

for  these  equations  to  be  true  must  be  in  the  range  of  T*,  and  so  ve 
# 

may  vrite  ■  T^  X^.  Substituting  these  expressions  into  the  original 
equations  (1.20)  and  1.21),  ve  have 

i 

1The  derivation  given  here  is  due  to  Porter  [22] . 
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(1. 


*1  "  <K>  +  T2U2  "  T1U1  "  X2 

(I- 

which  implies 

X1  "  K  +  T2rr2  X1  “  TlTl  Xl 

0 

(1. 

Thus 

F  t  m  ip  *  mm  *1  \  _  d\v 

L*  ‘  *1*1  *2*2  J  “1  T“o 

(1. 

or 

*1  *  [*  +  *i*l*  '  Vz*]  *xo 

(1. 

when  the  indicated  inverse  exists.  Thus  we  may  write 

«1  -  Tj*[l  *  7£*  -  Va*]  fa0 

(1. 

Ug  -  Ta*[l  *  *j»  *  -  T2T2*]  *X0 

(1. 

We  note  that* the  form  of  (1.19)  is  quite  general  and  that  the  results 
above  are  valid  for  any  abstract  Hilbert  space  functional  of  this 


When  and  ?2  are  given  by  (1.12)  and  (1.13) >  then 


TlI+TlTl"T2T2]"1  *  (*»*)  * 


rT 

I  4  j  ♦(T,s)G1(6)G* 

-  Jo 


)G1(s)G1(s)<|)  (T,8)ds 


-  f  ^(T,s)G2(s)Gg(s)(|>*(T,s)ds 

J  a 


0*(t)  (T,t)  • 


Zf  ve  define 


I  +  /*  ♦(T,s)G1(s)G*(B)<|>(T,8)dB 

J  A 


r 

-  I  $(T,8)G2(s)02(s)<|>*(T,s)ds 

•'ft 


K^tjt/T)  -  0*(t)$*(T,t)  I  +  J  t(T,s)G1(B)G*(8)(|>*(T,8)ds 


-  J  ♦(T,8)G2(8)G2(8)^(T,s)dB  $(T,t) 


r  rT 

K2(t;t,T)  -  G*(t)$*(T,t)  I  +  J  ♦(T,8)01(8)G*(B)l|)*(T,8)ds 

t 


rT 

-  J  ♦(T,8)G2(B)G2(B)(j>* (T,B)d8  $(T,t) 


then  we  may  write 


-  K1(t;0,T)Xo 

(1.3 

U2(t> 

-  Kg(tj0,T)Xo 

(1.3 

Here  the  arguments  0  and  T  of  and  Kg  indicate  the  initial  and  final 
times  t  ■  0  and  t  «  T,  respectively. 

Having  seen  this  solution,  we  may  state  the  problem  somewhat 
differently:  if  we  require  that  and  Ug  be  of  the  form  *  K^X^ 
then  what  are  the  transformations  and  Kg  Which  mlnlmaxlmize  the 
functional  j(U^,Ug)t  In  other  words,  ve  wish  to  find  K^  and  Kg  which 

min  max  J(K 

K1  K2 

♦  {V/l*o>  *  (Vo-Vo)}  ^ 

Now  and  Kg  are  linear  transformations  from  the  Euclidean  space 
containing  XQ  to  the  Hilbert  spaces  which  are  the  domains  of  and  Tg 
respectively.  We  form  the  functional  derivatives  of  J(K^,Kg)  with 
respect  to  and  Kg  as  follows:  let  and  dg  be  arbitrary  linear 
transformations  which  have  the  same  domains  and  ranges  as  K^  and  Kg, 
respectively,  and  let  s^d^  and  s^g  be  variations  about  K^  and  Kg, 
respectively,  where  s^  and  Sg  are  scalars.  Then,  remembering  the 
predicted  miss  distance  -  $(T,0)X  , 


,.K«)  -  h  i^X.-T.K,X.+TJ(0X.,(l)X_-T,K,X_+T0K0X.\ 

L*  -  *  *  O  •_  LT  3, 


,<Wl>IC2>  *  I  {([♦"W  *141>  *  vJv 

£141>  *  T2K2>o) 

*  (<V  »l4l>V  (Kl*Bl6l)Xo) 

-  (w4 


(1.37) 


and,  upon  first  expanding  the  above  expression  and  then  subtracting 
J(K^,Kg) ,  ve  have 


J(K1+  K2)  -  J(K1(K2)  -  -«1([*^1k1^#2>0*  Wo) 

*  "iW’Wo) 


t'^lWo) 


(1.38) 


Dividing  this  expression  by  s^  and  letting  s ^  approach  zero,  ve  have 
the  functional  derivative  of  J(K^,Kg)  vith  respect  to  K^,  which  is 

8J(K,,K2)  ,  , 

— 5iq—  ’  -  <L*  WVaJVWo)  *  <KlXo'SlXo)  <X-39) 

which  may  be  written,  using  the  properties  of  the  adjoint  operator 


and  combining  terms,  as 


dJ(K1,K2) 


-<(Ti*[>Ww]  -  kJx0,Six0> 


(l.^O) 
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A  necessary  condition  then,  that  be  a  minimizing  transformation,  is 


■kJvii.)  -  ° 


(l.Ul 


Now,  since  was  taken  to  be  an  arbitrary  linear  transformation, 
(1.1*1)  implies  that  the  vector 


(Ti'L'KVVs!  -  Kx)xe 


(1.1*2 


is  orthogonal  to  any  linear  transformation  of  X  ,  which  in  turn  implies 


that  the  transformation 


*  Ki 


is  xne  mux  rransrormation.  anus  ror  any  vector  k 


(1.^3 


(l.Ul* 


This  can  be  true  only  if  K^Xq  is  in  the  range  of  T1  ;  hence,  we  write 

*  * 

KjX©  ■  \x  for  some  X^  in  the  domain  of  . 

Following  a  line  of  reasoning  similar  to  the  above,  after 
differentiating  J(K^,Kg)  with  respect  to  Kg,  we  are  led  to 


T2*[*JP1K14T2K2]Xo  *  V* 


(l.*»5 


Hence,  KgX0  is  in  the  range  of  Tg  .  We  write  KgX0  »  Tg  Xg.  Substi- 
*  * 

tuting  X^  and  Tg  Xg  for  K^XQ  and  KgXQ,  respectively,  in  the  above 


_ _ ...  *.  *  .  *  - 


‘  w  -  i  '  4  *-*-■'►*  *  »  •  •  • 


pair  of  equations  gives 


* _ * 


*  _  * 


T1  >o  “  *1  T1T1  X1  +  T1  V2  X2  *  T1  X1 


T2>0  -  T2#T1T1*X1  +  W*  X2  “  T2  X2 


*  _  *. 


(l.*6) 


(1.^7) 


which  can  he  written 


-  TiTi  Xl  *  T2T2  X2  +  X1 

K  -  hTl\  *  V2*X2  *  X2 


If  XL  -  X2,  we  have 


4*0  -  "  V£*l  X, 


X1  "  [I+T1T1*  *  T2?2*1  K 


(1.W) 


(1.^9) 


(1.50) 


(1.51) 


if  the  indicated  inverse  exists.  Then,  since  K^X0  *  T^  X^,  we  have 


K1  -  -  T^r/1  * 


(1.52) 


and  similarly 


*2.  ‘  T2*[I*T1T1*  '  *S?2*]  ♦ 


(1.53) 
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These  expressions  are  what  we  expected:  knowing  that  the  overall 

optimal  strategies  are  linear  transformations  of  XQ;  we  are  not 

surprised  that  when  we  ask  which  linear  transformations  of  X_  are 

o 

optimal  we  get  as  an  answer  'che  same  (the  overall  optimal)  transforma¬ 
tions.  However,  the  technique  Just  employed  can  provide  optimal  linear 
strategies  even  when  the  form  of  the  overall  optimal  strategies  is  not 
known. 

Expressions  (1*32)  and  (1.53)  are  open -loop  optimal  control 

strategies.  Since  we  have  temporarily  assumed  that  the  players  are 

both  able  to  monitor  the  state  continuously,  we  may  convert  (1.52)  and 

(1.53)  to  closed -loop  or  feedback  type  strategies  by .replacing  <bx  with 

o 

*H"1(t)X(t),  K^tjO/r)  with  KjUjt/T),  and  Kg(tjO,T)  with  Kg(t;t,T). 

In  this  case,  as  the  game  progresses,  the  players  constantly  regard  the 
present  Instant  as  the  initial  time  of  a  new  game  and  form  their  control 
functions  accordingly. 


Chapter  2 

THE  STOCHASTIC  GAME  PROBLEM 


2.1  Preliminary  Remarks 

In  some  cases  the  players  are  not  able  to  monitor  the  state 
continuously,  but  are  able  to  make  noisy  observations  of  the  state  in 
the  form  given  by  (1.3A)  •  If  they  are  given  only  statistical  informa¬ 
tion  (1.1A)  about  the  initial  state  XQ,  then  presumably  they  will  be 
able  to  take  advantage  of  their  noisy  measurements  to  improve  the 
quality  of  their  play  over  that  of  strictly  open -loop  strategies. 

Thus,  we  must  find  stochastically  optimal  strageties,  —  methods  by  which 
the  players  process  their  observed  data  so  that  the  expected  value  of 
the  payoff  functional  is  minimaximized  with  respect  to  the  data 
processing  methods.  The  players  must  find  strategics  which  are  optimal 
within  the  constraints  of  their  limited  information.  This  information 

Includes  the  mean  X  and  covariance  of  the  initial  state,  plus  the 

*o 

observations  described  by  (1-3A.) •  These  quantities  must  be  combined 
functionally  to  form  the  strategies  U]L  and  Ug.  What  the  functional 
form  should  be  will  be  determined  by  certain  criteria  of  desirability, 
one  of  which  is  the  so-called  "certainty-coincidence"  principle  dis¬ 
cussed  by  Willman  [ 28] .  This  is  simply  a  requirement  that  the 
stochastic  strategies  coincide  with  the  deterministic  strategies  when 
the  noise  variances  go  to  zero. 

Other  criteria  are  simplicity  and  physical  realizability.  Accord - 

i 

lngly,  we  will  require  that  the  functional  form  of  the  strategies  be  a 
linear  combination  of  the  known  quantities  and  the  observables. 
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i 


2.2  A  Heuristic  Justification  for  the  Assumption  of  T.inear  Strategies 

An  interesting  aspect  of  the  selection  of  the  fora  of  the  strate¬ 
gies  for  discrete-tine  games  has  been  developed  by  K.  Bley  [6]  and  is 
extended  here  to  the  continuous -time  case.  We  have  hypothesized  a 
criterion  function  of  the  form 

T 

J  -  e{x*(T)X(T)  +  J  (U^-UgUgJdt}  (2.1) 

and  a  system  equation  which  may  be  rewritten 


dX  «  FXdt  -  GjUjdt  +  Ggt^dt  '  (2.2) 


We  define 


g  *  min  max  J 

U^tJU^t)  (2.3) 

For  a  given  set  of  noise  statistics  and  for  oinimax  control  strategies, 
the  payoff  will  depend  on  tQ  -  0  and  X(tQ)  *  XQ  ;  call  this  payoff 
f(x(t0)»t).  We  write  the  minimax  payoff  as 

*  ■  *.)}  <*••>> 
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Breaking  the  time  Interval  [0,T]  into  two  sub -intervals  [0,d]  and 
we  oay  then  write  the  criterion  functional  as 


u. 


g  n  nin  max 
U2 


E{  J  (U^-U^dt  +  X* (T)X(T) 


fT 

J  (UlUi-U2U2)dt) 


if* 

Bin  max  E  <  /  (U.U.  -U„U9)dt 

U1  U2  '  0  11 


+  min  aax  e{x*(T)x(T)  +  J  (UjU^-UgUgJdt}  )  (2.5) 


U1  U2 


Vow 


Bin  sax 
U1  U2 


fT 

e{x* (t)x(t)  +  J  («X-u*u2)at}  «  E{f(x(o)4ax,d)} 

(2.6) 


We  expand  f  in  a  Taylor  series  about  (x(0),oj 


f(x(o)  +  dx,d )  -  f jx(o),o]  +  ||-  A  +  |f  « 


(2.7) 


so 


e « - 


Bin  max  e(  f  (UJU.-«*U2)dt  +  f(x(0),o]  +  ff“  *  +  f§ 

«,  oa  uo  0 


'1  2 


(2.8) 
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.-..J 


Since  oln  max  Ejf(x(0),0j  \  -  g,  we  may  write  the  above  as 


U1  U2 


0  -  min  max  e{  ^  ♦  |f  «}  (2.9) 


'1  2 


Then,  writing  dX  -  FXA  -  G^A  +  GgU^  and  substituting  in  the  above. 


0  -  min  max  E  f  f  (U?U,  -U*U0)dt  +  It”  d 
U,  L^0  1122  at0 


'1  2 


H  (n-OjU^)  a} 


(2.10) 


*  * 


Approximating  the  Integral  by  (U^U^-UgUg)  A,  we  have’ 


min  max  e{(U*U1-U*U2)  A  +  ||-  A  +  ^FX-^OjU^GgUg)  a} 

(2.11) 


U.  U  . 

X  £ 


Dividing  both  sides  by  A,  we  have 


0  -  min  max  e{u*U1-U*u2  *  M“  *  H  (FX-O^+O^Jg)}  (2.12) 


U1  U2 


Since  min  max  E|f  |x(0),oj}  ■  g  , 


U1  U2 


I*-  ■  c4“  min  max  E(f(x(0),o])  ■  min  max  ®IH-l«o),o]} 


Therefore, 

|f-  .  min  ««  ♦  H  (FX-OjU ^Oj)}  (2.1 

°  U1  U2 

We  now  must  make  an  assumption  about  the  form  of  f(X,t)j  therefor 
we  choose  some  general  form,  such  as1 

f(X,t)  =  x*x0(t)x  +  U^CtJX  +  UgX2(t)  +  <{>(t)  (2.3; 

where  the  X,(t)  i  =  0,1,2  are  unspecified  matrices.  Thus, 

0 

||  «  X*X0(t)  +  U*Xx(t)  +  UgXg(t)  (2.3 

Utilizing  this  expression  for  in  (2.lU)  we  have 

^  -  min  max  ^U^-U^Ug  +  (X^Q^X^UgXg)  • 

°  Ux  U2 

(FX-G^U^  +  02U2)}  (2.1 

After  collecting  terms,  the  right-hand  side  of  (2.17)  ®ay  be  rewritten 

as 

■t.  mx  e{x\x  <•  u*a1u1 *  u‘«2u2  ♦  Aft 

U1  U2 

*X\V2*VW1ZS  (2'ir 

« 

*For  a  detailed  examination  of  this  subject  see  the  dissertation  of 
K.Bley  [6]  where  the  discrete -time  version  of  the  problem  is  analyze.. 
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Q-L  -  SYM^I-X^J 
Q2  -  sm{i+x2G^ 

F  X1  "  X0G1 
^5  *  F  X2  +  X0°2 

Q6  "  Xl°2  “  G1X2  ^2* 

and  where  SYm|a|  denotes  the  symmetrized  version  of  the  positive 
definite  matrix  A.  When  differentiating  the  expression  in  brackets 
with  respect  to  U^,  since  is  a  minimizing  control,  we  have 

E{U1Q1  +  X\  +  -  0  (2. 

Similarly, 


e{u*<»2  *x\* 


(2. 


Now  because  is  player  l’s  control,  0^  must  be  based  only  on  the 
observation  Z^j  similarly,  Ug  must  be  based  solely  on  Zg.  And  since 
it  is  a  property  of  conditional  expectations  that  E^x}  »  s|b{x  |  zjJ 
for  random  variables  X  and  Z,  we  may  write  the  above  equations  (2.20) 
and  (2.21)  as 


B  0 


(2.23) 


e{u*«2  ♦  x\  +  U*«6  I  zJ 

Furthermore,  since  =  U^Z^)  and  Ug  ■  Ug(Zg)  and  because  the  taking  of 
conditional  expectations  is  a  linear  operation,  ve  may  write 

U*Q1  +  e{x*  |  zJ  +  e{u*  I  zj  «  0  {2.2k) 

U*Q2  +  e{x*  I  zj  <l5  +  e{u*  |  zj  Q6  «  0  (2.25) 

or 

U1  *  E{X  I  ZJ  “  *1*6  E{U2  I  (2'26) 

and 

U2  “  E{x  I  ZJ  -  ^6  E{ul  f  ZJ  (2,27) 

How  if  we  denote  by  T^( • )  the  linear  operation 

Tx(*)  =  e{»  |  zJ  (2.28) 

and  similarly  for  Tg(  • ) 

Tg(-)  -  e{*  I  zJ  (2.29) 

the  equations  then  read 
« 

ui  "  V**Tix  ‘  *iViu2  (2*30) 
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(2.31) 


Ua  -  <q5v  -  <qV6T#x 

and,  substituting  the  second  equation  into  the  first, 

ui  “  +  qi1q6tiQ21q5T2x  +  Ql\Tl (2*32) 

The  expected  value  operators  commute  with  the  matrix  operators 

i  *  1|2|  a  •  •  80 

(i-Q"1Q6Q2^t1t2)u1  =  Qi1[ft6Q2lQ5TlT2 ^l]  X  (2-33) 

If  the  norm  of  the  operator  Q^Q^Q^QgT^Tg  is  *e8S  t^iaa  unity»  a 
Neumann  expansion  gives  the  inverse  of  I  -  Q j "^6T 1T2 *  80 

ui  *  x  (2*3*° 

The  above  expression  gives  In  terms  of  conditional  expectations  of 
the  state  vector  X.  A  similar  expression  exists  for  Ug.  We  have  con¬ 
sidered  only  the  starting  point,  but  any  point  may  be  considered  the 
starting  point  of  a  new  game. 

These  expressions  for  the  mini max  strategies  in  terms  of  condi¬ 
tional  expectations  of  the  state  indicate  that  vhen  the  process  statis¬ 
tics  are  Gaussian  the  optimal  strategies  are  linear  (affine),  since  the 
conditional  expectation  of  the  state  is  a  linear  transformation  of  the 
observations^.  We  might  interpret  this  to  mean  that,  vhen  pitted  against 
an  opponent  vho  is  known  to  use  linear  strategies,  the  optimal  counter¬ 
strategy  is  itself  linear.  However,  the  proof  makes  such  essential  use 


of  the  Gaussian  character  of  the  process  statistics  that  it  becomes 
invalid  when  either  player  at  any  Instant  uses  a  nonlinear  control 
vhich  would  destroy  the  Gaussian  probability  distribution.  When  the 
Gaussian  distribution  is  thought  to  be  a  reasonable  approximation  to 
the  true  distribution,  the  restriction  to  linear  strategies  is  perhaps 
Justifiable.  Furthermore,  since  the  solution  to  the  deterministic 
problem  is  known  to  involve  linear  state  feedback  as  control  strategies 
it  is  intuitively  reasonable  to  believe  that  for  small  uncertainties  in 
the  state  information  the  linear  certainty -equivalent  strategies  cannot 
be  too  far  from  optimal.  Thus,  the  class  of  general  linear  control 
strategies  must  contain  strategies  which,  if  not  overall  optimal,  are 
at  least  bounded  by  the  certainty -equivalent  strategies  in  payoff.  In 
practical  situations,  if  the  system  designer  has  some  confidence  that 
a  linear  strategy  will  give  nearly  optimal  performance,  he  can  Justify 
restriction  of  his  design  to  linear  strategies  on  the  basis  of  computa¬ 
tional  feasibility  considerations. 

A  final  word  about  the  form  of  the  strategies:  since  any  strategy 
which  minimizes  the  expected  value  of  the  payoff  must  in  some  way 
depend  on  the  probability  distribution  of  the  state  variables,  the  task 
of  selecting  a  strategy  which  is  generally  optimal  against  any  form  of 
opposing  strategy  is  rather  hopeless,  since  that  opposing  strategy  may 
alter  the  probability  distribution  of  the  state  variables  in  such  a  way 
as  to  give  each  player  a  different  notion  of  what  that  probability 
distribution  is.  It  is  with  a  view  to  the  futility  of  searching  for 
the  perfectly  optimal  strategy  that  we  gladly  restrict  our  attention  to 
the  task  of  finding  an  optimal  linear  strategy.  We  shall  soon  see  that 


even  this  restriction  is  not  sufficient  to  insure  that  the  resulting 
control  functionals  are  computationally  feasible. 

Ws  have  required  that  the  strategies  be  simple,  linear,  and 
physically  realizable.  A  pair  of  general  expressions  meeting  these 
requirements  is 

Ux  -  ♦  K1Z1  (2.35) 

U2  "  MA  +  H2Z2  (*•#> 

A  straightforward  approach  to  the  game  problem  might  be  to  assume 
strategies  of  this  form,  to  substitute  these  expressions  in  the  payoff 
functional,  and  to  proceed  with  the  optimization  over  the  class  of 
linear  functionals  *1’  *1*  *2*  and  Kg.  R>en,  if  the  certainty  -equivalent 
strategy  were  optimal,  we  would  expect  to  find  that  Mx  -  ^  and 
Mg  -  Kg,  while  H1  and  Kg  are  zero.  While  the  proposed  approach  is  in 
fact  a  poor  one  if  useful  solutions  to  the  game  problem  are  desired, 
some  revealing  facts  are  brought  to  light  by  talcing  it,  and  we  shall 
therefore  do  so. 

But  before  proceeding,  we  point  out  two  facts: 

%  A 

1)  We  have  tacitly  assumed  that  the  conditional  means  X^ 

A 

and  Xg  are  computable  by  the  players,  but  we  have  not 
specified  how  the  computation  would  be  done. 

I 

11)  We  have  asked  that  the  "certainty •coincidence  principle" 

’  i 

be  satisfied.  Thus,  in  terms  of  the  forms  we  have 
assumed,  we  require 
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M2  -  Kg 


Nx  -  0  Ng  -  0  (2.37! 

as  the  observation  noise  covariances  go  to  zero.  In  view  of  this 
requirement,  we  may  rearrange  (2.35)  and  (2.36)  into  a  more  convenient 
form.  We  write. 

U1  s  MA  +  N1Z1  *  (Mi+NiHi)  *1  +  N1  (Zi“hA) 

-  K1[xi+L1(Z1-H1X1)] 

-  kJx1+l1(z1^1)]  .  (2.38) 

A  A 

where  Kx  -  M1  +  N^,  Z^  «=  H^X^,  and  K^  =  N1-  Similarly,  we  write 

u2  -  K^Xg+LgCZg-Zg)]  (2.39) 

and  require  and  Kg  to  approach  the  deterministic  feedback  gain  as 
the  observation  noises  go  to  zero.  Thus,  our  assumed  strategies  have 
the  form  of  linear  transformations  of  the  conditional  mean  plus  linear 
operations  on  the  residuals.  The  payoff  functional  becomes 

J(K1,L1,Kg,Lg)  -  |  e{  4k^1K3[x1+11(Z1-S1)]  +  T^Xg+L2(Z2“Z2)]» 

4*  -  %[VLi<zi  AO  +  VstVV**  AO 

♦  +  k1[51*l1(z1-S1)],  kJA+li(ziA0 

» 

•  ■  KjXg+LgCZg-Sg)!,  *JVL2(Z2  AO  }  (2.40) 


and  we  wish  to  find  K^,  L^,  Kg,  and  L2 *  which  provide  a  saddle  point  of 
the  functional  (2.1»0). 

However,  before  proceeding  further.  It  la  desirable  to  pause  and 
develop  the  techniques  we  will  need  for  handling  such  problems,  i.e., 
finding  functional  derivatives  of  expected  value  functionals .  To  illus¬ 
trate  these  techniques,  we  will  derive  some  well-known  relationships 
which  will  be  found  useful  later  in  this  exposition. 

2.3  Minimum  Variance  Estimation 

The  first  example  we  treat  is  that  of  minimum -variance  estimation. 
Ve  wish  to  estimate  a  vector  X  on  the  basis  of  our  observation  of 
another  vector  Z.  We  assume  knowledge  of  the  mean  X  and  the  variance 
tgg  of  X  and  of  the  covariance  of  X  and  Z,  We  also  assume  know¬ 
ledge  of  the  mean  Z  and  variance  h _ of  Z.  We  ask  that  our  estimator 

Ulk 

be  linear  and  realizable  and  that  it  obey  the  certainty-coincidence 
principle.  We  thus  assume  the  estimator  has  the  fora 


X1  «  X  ♦  L(Z-Z) 


(2.kl) 


where  X^  denotes  the  estimate  and  L  is  a  linear  operator.  The  estima¬ 
tion  error  is 


«,  -  X-4,  -  X-  X-  L(Z-Z) 


(2.k2) 


and  the  variance  of  the  error  may  be  vritten  as  a  functional  of  L 


J(L)  -  I^X-X-KZ-Z),  X-X-L(Z-Z))}  (2.1*3) 

Forming  the  functional  derivative  of  this  functional  with  respect  to 
the  operator  L  and  setting  this  equal  to  zero,  ve  have 


e|(x-X-L(Z-Z),  d(Z-Z)^)j  -  0  (2.1*1*) 

where,  as  before,  A  is  any  arbitrary  linear  operation  on  the  observa¬ 
tion  Z-Z.  We  Interpret  (2.M*)  to  mean  that  the  expression 

0 

X  -  X  -  L(Z-Z)  (2.^5) 

Is  statistically  orthogonal  to  any  11nA**r  t’W'nsfnune+.l  on  rvf  t.he  observa¬ 
tion  Z-Z.  In  order  for  this  to  be  true,  Z-Z  oust  be  uncorrelated  with 
(2.1*5);  i.e., 

e{  [x-X-L(Z-Z)Jz-zT)  -  ^  *  0  (2.1*6) 

This  ia  an  abstract  form  of  the  Wiener-Hopf  equation  describing 
the  linear  estimate  which  is  optimal  in  the  mean  square  Bense.  It  is 
well  known  that  when  the  random  variables  are  normally  distributed  the 
linear  estimate  is  over-all  optimal.  Furthermore,  since  the  optimal 
mean  square  estimate  is  the  conditional  mean  of  the  random  variable  to 
be  estimated*  ve  see  that  (2.1*1)  provides  us  with  the  conditional  mean 
of  X  when  X  and  Z  are  normally  distributed  and  I#  satisfies  (2.1*6).  We 
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note  that  when  the  form  of  L  is  specified  as 


L(Z-Z)  -  J  W(t,T)[z(r)  -  Z(t)] 


then  the  Wiener-Hopf  equation  takes  its  familiar  form 


*XZ(t'a)  =  L  V(t,T)  ^(tjCjar 


2.U  A  Stochastic  Optimal  Regulator 


Equation  (2. 51)  will  be  satisfied  if 


[tt*  +  i]  X  •  $x  (2.52) 

or  X  «  [jnr*  +  f]"*  (})X,  if  the  indicated  inverse  exists,  which  would 
imply 


u  -  +  i]"1  $X  (2.53) 

When  the  system  under  consideration  is  a  continuous -time  dynamical 
system  described  by  the  differential  equation 


X(t)  «=  F(t)X(t)  +  G(t)U(t)  X(0)  »  Xr 


(2.5M 


then  TU  takes  the  form 


rT 

J  4>(t)  <t»"] 

•'ft 


TU  *  f  $(T)  $  A(T)G(T)U(T)dT 

’0 


(2.55) 


and 


TT 


rT 

J  ^(t^t)  (T)dT  (2.56) 


This  is  recognized  as  the  controllability  matrix  of  the  system.  Thus, 

a 

if  the  system  is  controllable,  TT  is  positive  definite  and  the 
existence  of  ("tt*  +  il"1,  is  assured. 


Ve  sight  oov  ask  the  question:  "Of  all  controls  of  the  for* 

U  »  KX,  uhieh  linear  transformation  K  minimizes  the  functional  j(K) 
where 

J(K)  »  |  {(($-TK)  X,(d>-TK)x)  +  (kx,Kx)}  tH  .  (2.57) 

By  a  procedure  similar  to  that  used  with  the  differential  game,  ve 
would  find  that  the  optimal  K  has  the  form 

K  *  T*[tT*  +  l]-1$  (2.58) 

This  is  not  a  surprising  answer  in  view  of  the  previous  result. 

Ve  may  now  consider  the  stochastic  version  of  this  problem. 

Assume  that  ve  do  not  know  X  exactly,  but  do  know  its  conditional  mean 
X  and  its  conditional  covariance  these  quantities  being  conditioned 
on  the  observation  of  a  correlated  random  variable  Z.  The  correlation 
between  X  and  Z  is  denoted  Random  variable  Z  has  conditional  mean 

£  and  conditional  covariance  these  quantities  being  conditioned  on 
the  observed  history  of  Z.  We  invoke  the  certainty -coincidence 
principle  and  the  criteria  of  simplicity  and  realizability  to  postulate 
the  form  of  U  as 

u  -  x[x  +  l(z-z)]  (2.59) 
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We  thus  ask  for  the  values  of  K  and  L  which  minimize  the  functional 

J(K,L)  =  |  E^<}>X-Tk[$+L(Z-&)],  $X  -  Tk[x+L(Z-£)]) 

+^k[x+l(z-S)],  k[x+l(z4)]^)J-  (2.60) 

Forming  the  derivative  of  this  functional  with  respect  to  L  and  setting 
this  equal  to  zero,  we  have 

J(K,L)  =  kVtk£4+L(Z^)^  -  K*T*$X  +  K*kT$+L(Z-&)1  »  0 

(2.61) 

which  will  be  satisfied  if 

[t* T  +  i]  k[x+L(Z-Z)]  -  T*(j>X  *  0  (2.62) 

We  interpret  this  to  mean  that  the  expression  (2.62)  above  is  orthogonal 
to  any  linear  transformation  of  the  quantity  Z  -  Z.  In  particular, 
(2.62)  is  orthogonal  to  £(T  T+I)  K  -  T*<fT|  L(Z-S),  and  we  may  express 
this  by 

E{([T*T+I]  *P+I«(Z*£]  -  T*$X,  [(T* T+I)  K  -  T*<j>]  L(Z-S))}  «  0 

(2.63) 

We  may  also  differentiate  (2.60)  with  respect  to  the  transformation  K 
and  set  this  equal  to  zero.  Doing  this,  we  have 

J(K,L)  •  T# Tk[$+L(Z-4)]  -  T*$X  +  k[S+L(Z-$)]  »  0  (2.6U) 
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'Again  ve  Interpret  this  to  mean  that  (2.6b)  is  orthogonal  to  any 
linear  transformation  of  the  quantity  $  +  L(Z-£),  In  particular  to 
[(TT+l)  K  -  T*(j>]  [$+L(Z-S)].  This  ve  express  as 

E^[tVi]  k[^L(Z^)]  -  T*<|>X,[(T* T+l)  K  -  T*^][^+L(Z-S)^}  »  0 

(2.65) 

Combining  (2.63)  and  (2.65),  ve  have 

e[([t*T+i]  k[$+L(Z-£)]  -  T*$X,[" (T*T+I)  K  -  T*<|fj  $)]•  -  0  (2.66) 

A 

Ve  may  write  X  =  X  +  e;  and  then,  using  the  fact  that  estimation  error 
is  orthogonal  to  any  linear  transformation  of  the  conditional  mean  (for 

I 

normal  random  variables),  ve  rewrite  (2.66)  as 

e|[tVt]  k[$+L(Z-£)1  -  T*$j  [(TT+I)  K  -  T*$]  {)}  -  0  (2.67) 

or,  defining  A  =  (T*T+l)K,  we  write  (2.67)  as 

E-[((A-T*$)X  +  AL(Z-$),(A-T*$)  -  0  (2.68) 

Proper  Interpretation  of  (2.68)  implies  that 

* 

E{<A[W£)]  -  T*$X,  AL(Z-S))}  -  0  (2.69) 

or,  again  writing  X  *  &  +  c  and  noting  that  the  estimation  error  is 
orthogonal  tp  all  linear  transformations  of  the  observables  (for  normal 


random  variables),  ve  may  write  (2.69)  as 

b{((A-T*$)  X  +  AL(Z-Z) ,  AL(Z-S))}  -  0  (2.70) 

Subtracting  (2.68)  from  (2,70),  we  have 

e{(al(z-z),  AL(Z-£)^  -  ((A-T*$)  x,(A^r*^)  x)}  «  0  (2.71) 

How,  since  the  first  tenn  on  the  left  depends  only  on  the  covariances 
of  observation  noise  and  initial  values  of  X  and  the  second  term  depends 
on  the  mean  initial  value,  for  (2.71)  to  be  satisfied  ve  must  have 

A  -  T*<}>  -  (TT+I)K-T*$  »  0  (2.72) 

which  implies 

AL  *  (T* T+l)  KL  =  0  (2.73) 

Equation  (2.72)  is  the  relation  which  described  the  feedback  gain  K  for 
the  deterministic  regulator  problem,  so  the  solution  of  (2.72)  is  known 
to  be 

K  «  T*[TT*  +  i]"1  $  (2.7M 

Substituting  this  expression  into  (2.73)*  v®  have  after  some 
manipulation 

T*$L  »  0  (2.75) 
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This  will  be  satisfied  if  L  ■  0.  Thus  the  stochastic  optimal  controller 
is 

UST0CH  OPT  -  *  iVl]'1  #  <2<76> 

We  have  obtained  a  weakened  version  of  the  separation  theorem;  i.e.,  we 
assumed  a  control  function  of  the  form 

U  «  k[$+L(Z-&)]  ;  $  »  Best  linear  estimate  of  X  (2.77) 

and  found  that  L  *  0,  and  K  is  equal  to  the  feedback  gain  matrix  of  the 
deterministically  optimal  control.  , 

2.5  A  Stochastic  Differential  Gome  -  Special  Case 

We  now  return  to  the  stochastic  game  problem,  having  developed 
some  techniques  and  insights  which  will  prove  useful.  The  functional 
we  wish  to  mlnlmaxlmlze  is  given  by: 

-  TjKj[xi*h(ZA)]  ♦ 

♦  <K  i[W*i‘ Ai>  «i[W*  A>3  > 

-  (*aLWvV]’  (8-78) 
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In  some  special  cases  It  is  possible  to  simplify  this  problem  by 
decoupling,  so  that  each  player  solves  an  independent  stochastic  optimal 
regulator  problem.  Therefore,  before  proceeding  to  the  general  problem, 
we  examine  one  of  these  special  cases.  By  making  the  following  changes 
of  variables, 

T2  -  (I4TiTi>T2  (I^2*TiT1T2)"*  (2.79) 


Kl[WZA>l  Klt^l+Ijl^Zl’^lO  +  TiT2Ki^2+Il2^Z2“^2^1 

(2.60) 

Kg  -  (i-KT^T^Tg^Kg  (2.81) 

the  payoff  functional  becomes 


J(K^ ,1,  .lU,!,)  «  | 

*X  -  T1K^X1.l1(J1-S1)]  *  TjxJx2+L2(Z2-52)]) 

+  <KIVL1(VZ1>]’  *IWVS1>]  ) 

+  TlT2EJ^2+L2^Zl"^l) 

-  ((IVrj’T^Tg)^  k^32-h.2(z2-S2)], 

(W2*TjT*T2)^  K^2*X.2(Z24)])}  (2.82) 
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Differentiating  this  functional  with  respect  to  and  Lg  and  setting 
the  results  equal  to  zero,  we  have 


-  *iX*x 

■  ° 


aj 

8Lg 


K2*Tg*T^C^2+L2(z2-42)]  +  Kg*Tg^X 

*  Kg  K^g+LgfZg-^gjJ  «  0 


i  • 

Differentiating  with  respect  to  K_  and  Kg,  we  have  . 


(2.83) 


(2.81s) 


ax 

ax. 


aj 

ax. 


titiki[^i+li(zi^i)]  -  Ti^x  *  k$i+li(zA)]  “  0  (2*85) 
T2T2KiVL2(Z2"Z2)]  +  T2^  *  K^Xg+L2(Z2-Z2)]  -  0  (2.86) 


These  equations  are  seen  to  be  independent  and  Identical  in  form  to 
those  of  the  stochastic  optimal  regulator  problem.  Thus,  the  two 
players  play  the  transformed  game  using  minimum -variance  type  state 
estimators,  transforming  their  strategies  back  to  the  original  game  by 
use  of  the  transformation  equations.  However,  this  solution  is  limited 
in  usefulness  in  that  it  requires  player  1  to  know  the  quantity  Zg  •  £g 
his  opponent's  observation,  a  circumstance  vhich  would  rarely  be  true. 
This  result  is  essentially  that  of  Behn  and  Bo  [3]  but  is  a  slight 


generalization  in  that  neither  player  need  have  exact  knowledge  of  the 
initial  state. 

2.6  A  More  General  Stochastic  Differential  Game 

In  the  case  of  the  general  stochastic  differential  game,  it  can  be 
shown  by  techniques'  similar  to  those  used  in  equations  (2*6l)  through 
(2.7*0  that  the  minimax  values  of  and  Kg  are  given  by 


Tl[l4TlTl“T2r2]"1  ♦<*»*>> 

(2.87) 

Tjl^T*^*]'1  $(T,to) 

(2.88) 

These  are  seen  to  be  the  deterministically  optimal  feedback  gains. 
Analogous  to  (2.75) >  but  considerably  more  complicated,  are  the  equa¬ 
tions  describing  L_  and  L_ 

X  X 

<2,89) 

WiVi  *  [*^1]  L*Va  *  w.*  (2,90) 

The  above  equations  are  necessary  conditions  which  must  be  satisfied  by 
linear  operations  on  noisy  state  observations  which  make  up  part  of  the 
strategies  assumed  in  (2.38)  cud  (2.39).  The  derivation  of  equations 
(2.87)  through  (2.90)  is  given  in  the  Appendix. 


We  may  make  the  following  observations  at  this  point: 


i)  L1  * Lg  ■  0  is  not  a  solution  to  this  set  of  equations; 
hence,  ve  see  that  the  optimal  linear  strategy  is  not 
a  certainty -equivalent  strategy. 

ii)  As  *  -  ,  *  ,  ♦.  -  ,  and  *  become  small, 

Z1Z1  Z2Z2  *1Z2  *2*1 
■  Lg  *  0  tends  to  more  nearly  satisfy  (2.89)  and 

(2.90);  hence,  our  solution  satisfies  the  certainty 

coincidence  principle. 

We  may  illustrate  the  use  of  the  theory  just  developed  by  a  simple 
example  due  to  Wlllman  [28].  , 


2.7  A  Simple  Example 


Example:  Discrete -time,  one-stage  scalar  game 


Transition  equation:  Y  *  X  +  U  -  V 
Payoff  functional:  J  «  ^  E^ay2  +  U2 
Observation  equations:  ■  X  + 


-cv*} 


c  >  a  >  0 


X  +  Tig 


is  normal; 


■1' 


0  R, 


;  X  is  normal;  (0,P) 


Making  the  following  definitions  and  changes  of  variables. 


u,  -  u 


T,U,  -  -  ./?U, 


$x  *  JTx 


under  these  transformations  the  problem  statement  becomes: 


Transition  equation: 
Payoff  functional: 

Observation  equations: 


is  normal; 


x  +  U1  y?u* 

I  E^XVT^+TgUg)2  +  u 


-  X  +  \ 

-  X  +  r\2 


;  X  is  normal;  (0,P) 


We "first  derive  expressions  for  the  feedback  gains  and  K 


Kx  «  Tjl+TjT*^*]'1  4  «  +  a  -  » 

-  tT  I-KT^Tt-T^T*!"1  6--’/rFl+a-  rl"\/a  -  - 


CVI  H 


which  simplify  to 


ZXZ2  +  c(a+l)  -a  L2*Z^S2 


c(a+l)-a  ^Zg 

|Va  (c«a)  . 
c(a+l)-a  "egZ 


c(«tl)  -  a  -  -  *  *«2Z1  ♦  <=(**1) 


a  c  L1  *Z,Zrt  +  ^c"a^  L2  *Z^Z2  “  a  e  +  (®“*)  *e 


12 


12 


We  must  now  derive  expressions  for  the  conditional  means  and 
covariances:  • 


A 

X, 


k:h>K-KiZx-k'h-£s:hmi 


P+Ri 


t 


P+R2  Z2  '  ^2  *  ^2  '•  Z2  *  ^2  ■  *2  *  *»  *  i 


*  p  R.X  -  PT1- 

V«! 


*  »  R-X  -  PTU 

x  -  *2  ‘  x  •  i%  <X+V  -  -S572 


Vx  '  ^A*8}  ’  rrhs  *{<XV}  -  ra 


e{<VS1><V22>}  *  (p>r}(p,r2)  k[(X‘V'X^ 


Tz,z 


12 


**1*2 

iP+R  KP+Rg)  "  ^gZ, 


-  e{(Z2-$2)2} 


Z&2 


,  a  a  >  ,  R.X  -  R,  >1 

"  E{(x-x1)(zi"zi)} c  E1  (p+r^t  ’  <x+\); 

R?P  -  PR? 

■A  _  1  -  A 


(P+R-i Y 


\z2  *  s{<x-^i)(V*2)}  *  E{  "Vh^1  '  t4t  <X*1'2^ 


prxr2 

?+R1)(p+R21 


r  A  A  >  r(R2X’P^2)  (X+V\  ”*1*2 

♦e^  *  El(X  -X2^Zl"Zl)J  *  El  (P+Rg)  R1  Tp+R^TJ  *  (P+R-jHP+RgJ 

f  a  a  ^  r  -  Rp  > 

♦c^2  *  E^(X-X2)(Z2-Z2)}  *=  e{  ^p+Rg]-  *  (p+R2T  *  *X+Vj 

r!p  -  PR2 

-  — - 1  -  0 

(P+R2)2 

Substituting  these  expressions  into  the  equations  describing  and  Lg, 


ve  have 


t  x  PR1R2  — 

c(a+l)  Lx  p^-  *  aLg  (p+R1)(p+Rg)  ■  (P+R1)(P+ 


-aPR^g 


PR1R2  R2 
"1  (P+R1)(PR1Rg)  +  (c_a)  L2  P+R^ 


"A 


?+r1)1p+r2; 


vhich  simplify  to 


cR^(a+l)(P+Rg)  Lx  -  aPRgLg  *  -aPRg 


acPR^  +  Rg(c-a)(P+R1)Lg  -  aCPR1 


R2r (P+R1) (P+R2) ( a+l) ( c -a)  +  a2?2] 


We  may  now  derive  expressions  for  the  control  functions: 


rA  .  A 

U1  *  KlLXl  +  MWj 


Lc(a+l)-aJ 


aP[acPR.  -  (c-a)(P+R, )R01 


c(a+l)-a)  ^  [  (P+R^)  (P+Rg)  (a+l)  ( c  -a)  +  aV2] 


After  some  manipulation,  this  becomes 
-aP[(c-a)(P+Rg)  ♦  aP] 

1  [(P+RjKP+RgXa+lKc-a)  +  a^P2] 


Similarly, 


U2  ■ 


'  p  aP[aPRg  +  (a+l)(P+Rg)cR1']  RgZg 

P+H2  2  RgC(P+R1)(P+Rg)(a+l)(c-a)  +  a^2]  ^P+H2^  _ 


?+Rg;rc(a+1)-a] 


a[aPRg  +  (a+l)(P+Rg)cR1] 
C(P+R1)(P+Rg)(a+l)(c-a)  +  a2P2]_ 


vhlch  after  some  manipulation  becomes 


-aFV^[(P+R1)(a+l)  -  aP] 
[(P+R1)(PtRg)(a+l)(c-a)  +  a 


These  answers  are  the  same  as  those  obtained  by  Willman  when  they  are 
retransformed  to  the  original  problem. 

We  note  that  the  problem  was  solved  in  three  parts: 
l)  The  feedback  gain  was  derived. 

11 )  The  conditional  means  and  covariances  were  derived. 

iii)  The  expressions  for  and  Lg  were  derived. 

Of  these  steps,  (l)  is  relatively  straightforward  and  would  be  done  in 
the  course  of  solving  the  deterministic  game.  Furthermore,  the 
procedure  is  not  altered  essentially  when  higher-dimensional  multi¬ 
stage  or  continuous -time  games  are  considered.  Stepq  (ii)  and  (ill) 
are  simplified  immensely  when  one -stage  discrete -time  games  are  con¬ 
sidered,  because  the  problem  of  obtaining  the  conditional  statistics 
is  Isolated  from  that  of  obtaining  and  Lgi  i.e.,  steps  (ii)  and  (iii) 
may  be  taken  separately.  In  multi-stage  or  continuous -time  games  the 
covariance  of  the  state  depends  on  L^  and  Lg,  and  vice-versa.  The 
result  of  this  is  that  the  conditional  statistics  and  and  Lg  must  be 
obtained  simultaneously. 

No  attempt  to  perform  this  computation  will  be  made,  since  the 
ensuing  analysis  will  show  that  no  computationally  feasible  solution 
exists.  In  Chapter  3  the  problem  of  computing  conditional  statistics 
is  taken  up  under  the  simplifying  assumption  that  L^  ■  Lg  ■  0.  It  is 
shown  there  that,  even  under  this  assumption,  computation  of  the  con¬ 
ditional  mean  of  the  state  requires  that  each  controller  retain  the 
entire  past  history  of  his  observations.  This  data  storage  requirement 


a  different  approach  is  taken  which 
iraized  over  a  set  of  computationally 


Chapter  3 

PROBLEMS  OP  SPATE  ESTIMATION  IN  TWO -INPUT  COOPERATIVE 
AND  COMPETITIVE  CONTROL  SITUATIONS 

3*1  Discrete -Time  Case 

To  illustrate  the  various  considerations  affecting  the  problem  of 
estimating  the  state  of  a  linear  system  controlled  by  two  or  more  inputs 
derived  from  independently  made  state  observations,  we  begin  with  a 
discrete -time  example. 

Suppose  we  have  a  system  described  by  the  difference  equation 

X(i+1)  »  <j>(i+l,i)  X(i)  -  G^i)  U^i)  ♦  Gg(i)  U2(i)  (3.1) 

where  X(*)  is  an  n-vectorj  e|x(0)|  «  Xo;  Cov  Tx^  -  tx  and  <b(i+l,i) 

o 

is  a  state  transition  matrix  and  thus  satisfies  relations  such  as 
$(i,i)  «  I  ;  I  *  Identity  Matrix 

4>(i+l,i)  »  F(i)  *>(i,i)  (3.2) 

It  was  pointed  out  at  the  end  of  the  previous  chapter  that  the  condi¬ 
tional  statistics  and  the  optimal  L^  and  Lg  must  be  obtained  simultane¬ 
ously.  Since  here  we  are  primarily  interested  In  providing  an 
expository  development,  we  initially  treat  a  simplified  version  of  the 
proV-m:  we  shall  assume  that  L^  and  Lg  are  known  by  both  players,  so 
that  we  have  only  to  deal  with  the  state  estimation  problem.  Further¬ 
more,  we  shall  assume  that  controller  number  2  is  restricted  to  Lg  •  0. 
Thus, 

U2(i)  -  Kg(i)  Xg(i) 

l»T 


(3.3) 


X2(i)  =  E{x(i)!z2(i)} 


(3. 


z2(i)  «  Z2(0),  Z2(l),  — ,  Z2(i)  (3*5) 

z2(j)  -  H2(J)  X(J)  +  Tlg(j)  ;  J  -  0,1,  — >  H  (3.6) 

vhere  H^C  j)  is  an  nigXn  matrix  and  T)2  is  white,  Gaussian,  and 

E{yi)  -  RgU)  4^  (3.7) 

and  vhere  6. .  Is  the  Kronecker  delta.  The  problem  then  is  to  compute 

lj  • 

X^i)  *  ^XdHsjCl)}  (3.6) 

where 

*x(i)  -  Zx(0),  Z^l),  — ,  Z1(i)  (3-9) 

ZX(J)  -  HX(J)  X(J)  +  ^(J)  (3-1C 

\  vhite,  Gaussian,  E^i)  Tl*(j)j  -  R^i)  6^  (3.U 

and  and  T)2  are  independent 
The  following  relations  hold 

X^i+1)  -  E|x(i+l)|z1(i)|  -  <Hi+l,i)  X^i) 

♦  02(i)  Kg(i)  $^(1)  -  G^i)  Ux(i) 


(3. 


where 


Xg^i)  =  B^OU^i)} 


(3.13) 


We  know  that  X(i+l)  and  Z^(i+l)  are  correlated  Gaussian  random  vectors, 
that  X(i+l)  has  conditional  mean  X^(i+l),  and  that  Z^(i+l)  has  condi¬ 
tional  mean  H^i+l)  X(i+l)  =  Z(i+l) .  Thus,  by  a  well-known  property  of 
Gaussian  random  vectors  [8,  p.32],  we  may  write 

X^i+1)  -  X^i+l)  +  A(i+l)  [z^i+l)  -  Zjd+1)]  (3.1*0 


where 


where 


V1+1)  -  ♦xz1^i+1)\z1  <i+1) 


(3.15) 


^(i+1)  »  e{  [x(i+i)  -  X(i+l)]  [z^i+1)  -  z(  1+1)7}  (3.16) 


♦z  z  (i+i)  -  e{  [z1(i+D  -  zx(i+i)]  [z^i+D  -  zx(i+i)7} 

(3.17) 

A^( i+l)  is  conventionally  given  in  another  form.  If  we  define  the 
error  covariance  matrix  by  the  equation 

Pu(i+1)  i  e{  [x( i+l)  -  Xx( i+l)]  [x( i+l)  -X(  1+1)7} 

-  E^d+l)  c*(i+l)} 


1 


(3.18) 


c^(l+l)  a  estimation  error 


then  since 


«^(i+l)  »  X(i+l)  -  X(i+l) 

»  X(i+1)  -  X(i+1)  -  L^i+1)  [z(i+l)  -  Z(i+1)] 
-  X( i+l)  -  x(i+l)  -  Lx(i+1)  [hx( i+l) 

•  ( X(i+l)  -  X(i+l)]  +  Tj^i+l)] 

and  because  ^(i+l)  is  independent  of  X(i+l)  -  X(i+l),  ve  have 

pu(m)  -  ^(fi)  -  tn(i*i)  <(!♦« 

•  H^i+l)  ^(i+l) 

Furthermore,  since  Z^i+l)  a  i+l)  X(i+l)  +  ^(i+l),  we  have 

♦z1z1(i+1)  "  ♦xx(i+l)  Hl(i+1)#  +  Ri(i+3L) 

We  may  thus  write 

R^i+l)  -  z  (i+1)  ’  ^(i+l)  l£(i+l) 

"  *Z.Z  "  ♦z.Z.1(i+l)  H.(i+l)  t„(i+l) 


tz1zi1(1+1)  "  L1  "  *z1z’1(1+1)  Vi+1)  ♦xx(i+1)  H^i+1)] 

•  R^i+l)"1  (3.2U) 

and 

/^(i+1)  -  ^  (i+1)  *z  z"1(i+l) 

X  •  X 

-  ♦xxd+l)  b£(1+1)[i  -  ♦z^z’1(i+l)  H^i+l) 

.  ^(i+D  ^(i+ljjR^Ci+l) 

-  [*^(1+1)  -  ♦xxd+i)  h^u+i)  tz  z"1(i+l) 

•  H^i+1)  ♦xx(i+X)]H*(i+l)  R^i+l) 

-  Pn(i+1)  H*(i+l)  R^(i+l)  (3.25) 

Combining  (3.12),  (3*Xlt),  and  (3.25),  ve  have 

XjU+1)  -  $(i+l,i)  XjU)  ♦  Pu(i+X)  H*( i+l)  R“X(i+l) 

•  [zx(i+l)  -  H^i+1)  X^i+1)]  +  C2(i)  Kg(i)  X^i) 

-  O^i)  U^i)  (3.26) 

This  eqaatlon  involves  the  quantity  Xg1(i) .  To  compute  X21(i)»  let 
us  Initially  annriroe  that  controller  number  2  uses  a  state  estimate  vhich 


has  the  form 


1 

Xjjd)  -  e(i,o)  x(o)  +  W(i,j)  z2U)  . 


(3.27) 


with  no  restrictions  for  the  moment  on  the  matrices  9 ( 1,0)  or  W(i,j). 
However,  we  do  assume  that  these  matrices  are  known  to  controller 
number  1.  Since  X(0)  is  known  to  both  players,  we  may  write 

1 

x^i)  -  e(i,o)  x(o)  +  jT  v(i,j)  e{z2(j) 1*^(1)}  (3.26) 

and  because  Z2(j)  and  Z^(l)  are  correlated  Gaussian  random  variables. 


we  may  write 


s{z2(j)|Zl(i)}  -  E^gO)!*^!-!)}  +  M(J,  1)^(1)  -  Z^i)] 


where 


M(J,1) 

-  b{[z2(j) 

♦z,z,(i) 

-  e{  [z^i) 

hh 


!•%  <v>> 


(3.30) 


b{[z2(j)  -  e{z2( j )  {  z^( i -1)}^  [z^i)  -  zx(i)]*} 

(3.31) 

s{  Fz,(i)  -  Z,(i)l  <rZ,(i)  -  t(i)T>  (3.32) 


notice  that  (3.29)  is  a  difference  equation  whose  solution  may  be 


written 


(3.3^) 


Thus,  (3.28)  may  be  written 


i 

$21(i)  *  e(i,o)  x(o)  +  ^  v(ij)  e{z2(j)} 


1  1 

+  ^  ^  [zx(k)  -  Z^k)] 

Let  us  assume  that  e|z2(J)|  *  D(<l)  X(0) 


(3.35) 

(3.36) 


Then,  defining  T(i,0)  -  «(i,0)  +  ^  V(i,j)  D(j),  we  may  write  (3-37) 


i  A 


A 

X, 
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(i)  -  T(i,0)  X(0)  +  jT  w(i,3)  ^  M(j,k)  [zx(k)  -  Zx(k)] 


(3.38) 


A 

We  observe  at  this  point  that  in  order  to  calculate  X^(i+1)  one  must 
know  $2l(i),  which  in  turn  requires  the  preservation  of  the  observations 
Z^k^k  -  0,1, — ,i. 


3.2  Continuous  -Time  Case 

WO  are  interested  mainly  in  the  continuous -time  version  of  the 
equations  so  far  derived.  The  continuous -time  equations  are  obtained 
by  the  familiar  process  of  writing  $(1+1,1)  as  $(t+A,t)  and  expanding 
A(t+d,t)  in  a  Taylor  series  as 


$(t+A,t)  -  <}>(t,t)  +  F(t)A  ♦  0(A2)  -  I  +  F(t)A  ♦  0(A2)  (3.39) 

We  also  modify  the  forcing  terms  in  (3.1),  so  that  this  equation 
becomes 

X(t+A)  -  [l  ♦  F(t)A  +  0(A)2]  X(t)  -  O^t)  Ux(t)A  +  G2(t)U2(t)A 

(3.^) 

After  ve  have  subtracted  X(t)  from  both  sides,  divided  both  sides  by  A, 
and  taken  the  limit  as  A  approaches  zero,  (3.1*0)  becomes 

X(t)  -  F(t)  X(t)  -  G^t)  Ux(t)  +  Gg(t)  U2(t)  (3.^1) 

By  a  similar  procedure,  (3*12)  becomes,  upon  substituting  (3>39)  and 
modifying  the  forcing  terms, 

X^t+A)  -  [i  +  F(t)A  +  0(A2)]  5x(t)  +  02(t)  K2(g)  Xg^tjA 

-  Q^(t)U^(t)A  (3^2) 

Letting  A  approach  zero,  ve  see  that  X(t)  -*$(t)7  Likewise,  (3*26)  may 
be. written 

^(t+A)  -  [i  +  F(t)A  +  0(A2)]  ^(t)  +  Pn(t+A)  H*(t+A)  r'^UA) 
[zx(t4A)  -  Hx(t+A)  X^(  t+A )]  A  +  Gg(t)  K^t)  ^(t)/* 

-  Gx(t)  Ux(t)A  (3.^3) 


Subtracting  &(t)  from  both  aides,  dividing  both  aides  by  A,  taking 
limits  as  A  approaches  zero,  and  using  (3*^3)»  ve  have 

Xx(t)  -  P(t)  Xx(t)  +  Fn(t)  H*(t)  R^(t)  [Zl(t)  -  H^t)  ft^Ct)] 

♦  Gg(t)  Kg(t)  ^21(t)  -  G^t)  Ux(t)  (3 M) 

Here  the  spectral  properties  of  Tj^(t)  must  be  modified  so  that  over  a 
unit  time  interval  the  additive  noise  has  the  same  corruptive  influence 
as  in  the  discrete-time  case.  Specifically,  ^(t)  is  taken  to  be  a 
vhlte  noise  process  vith  spectral  density 

E^t)  D*(r)}  -  R^t)  6(t-r)  (3.^5) 

where  6( t-r )  is  a  Dirac  delta  function. 

Using  these  same  techniques,  (3*29)  becomes 

E{z2(TJ)Jzi(t)}  -  E{z2(Tj)l*1(t*d)}  +  M(Tyt)^Z1(t)  -  Z-l(t)]  A 

(3.^) 

Subtracting  E^ZgftJjz^t-A)}  from  both  sides,  dividing  by  A,  and 
taking  limits  as  A  approaches  zero,  ve  have 

|fE{z2(t)|.1(t)}  -  M(t, *)[*!<<.)  -  Zx(t)]  (3-W 

Equation  (3.4?)  has  solution 

*{z2(t)Jzi(t)}  «  E{zg(t)}  +  J  M(t,a)[z1(o)  -  Zx(a)] 

-  D(t)x(0)  +  Mft,©)^®)  -  Zx(a)]  to  (3*j»B) 


The  similarity  to  (3«3*0  is  obvious. 


If  ve  assume  that  controller  number  2  uses  a  state  estimate  of  the 


form 


X0(t)  -  9(t,0)  X(0)  + 


f  W(t,T)  Z2( 

^  A 


*r)dT 


(3-^9 


then  the  continuous -time  analog  of  (3*3&)  is 

$01(t)  -  T(t,0)  X(0)  +  J  W(t,T)  I  M(t,<t) 

0  0 

•  [z^c)  -  Z^o)]  dffdt  (3*50 

Thus,  calculation  of  Xg^t)  appears  to  require  storage  of  Z^o),  0£a  *t. 
We  now  prove  this  to  be  true;  i.e.,  (3«50)  can  be  obtained  in  no 
simpler  form. 

To  this  point  ve  have  made  no  restrictive  assumptions  about  W(t,*r) 
or  e(t,0).  We  shall  now  do  so,  shoving  that  in  order  for  each  con¬ 
troller  to  compute  E<[x(t) Jz^t)},*  -  1,2,  he  must  store  all  past 
observations. 

We  first  assume  that  W(t,t)  is  of  the  form 

W(t,t)  «  C(t)  Q(t,T)  Jf(t)  (3»53 


where  Q(t,r)  is  a  p  x  p  matrix  which  satisfies 

Q(t,T>  -  I  (3.5S 

Q(t,t)  -  r(t)  Q(t,-r) 


(3.5: 


and  H(t)  is  a  p  x  m9  matrix.  We  also  assume  that  9(t,0)  is  of  the  form 


9(t,0)  -  C(t)  Q(t,0)  y(0) 


(3.5^ 


where  C(t)  is  a  differentiable  n  x  p  matrix.  These  assumptions  are 


a  .  . 

equivalent  to  requiring  that  Xg(t)  be  given  by 
Xg(t)  -  C(t)  q(t) 


where  q(t)  satisfies  the  pth  order  differential  equation 


(3.55 


Is  q(t)  -  r(t)  q(t)  +  N(t)  Z2(t) 


q(o)  -  y(o)  x(o) 


(3.56 


(3.57 


We  call  X9(t)  a  "p  dimensional  state  estimator."  Under  these  as sump - 


A  .  . 


tlons,  we  see  that  X^t),  given  by  {J.yiJJ,  can  oe  written 

^21  “  [C(t)  Q(t,0)  *  J0  C(t)  N(t)  D(r)d  T]  X(0) 

rt  / 1 

+  J  C(t)  Q(t,T)  J  M(t ,0)  [z^o)  -  Z-^ff)]  d  o  d  t 


(3.58 


Defining  a  new  variable  ^(t)  by 


ft 

$(t)  -  [Q(t#0)  y(0)  +  J Q(t,r)  N(t)  D(t)]  X(0) 

ft  ft 

+  J  Q(t,*r)  H(t)  J  M(t,o)  [zx(c)  -  *j(o)]  dodT 


1  '0 


(3.59 
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(3.60) 


ve  see  that  $2^(t)  “  c(^)  ^(*) 

and  that  Xgl(t)  satisfies  the  integro -differential  equation 

5(t)  *  c(t)  ^^(t) 


(3.61) 


r(t)  q(t)  ♦  Il(t)  D(t)  5c(0) 

rt 

+  N(t)  M(t,o)[zi(a)  -  Z^o)]  do 
+  J  Q(t,t)  N(t)  M(t ,t)  [zi(t)  -  Zx(t)]  dT 


(3.62) 


At  this  point  ve  define  a  nev  matrix 

"(t)  ■  I  Q(t,T)  H(t)  M(T,t)  dT  (3.63) 

J0 

Also,  ve  note  that  because  of  (3*48), 
r t 

J  M(t,o)  [zx(c)  -  Zx(o)]  do  +  D(t)  X(0)  -  t)  I zl,(t))^ 
°  (3.64) 

and  because  of  the  independence  of  "^(t)  and  Z^(t), 

E^Zg(t)  |Z^(t),  o«TSt}  -  Hg(t)  E{x(t)|Z1(t),  o<T«t) 


V t)  xx(t) 


(3.65) 
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We  may  thus  write  (3.56)  as 


5t  $(t)  -  r(t)  $(t)  +  R(t)  Hg(t)  ^(t) 

♦  "(t)  [z^t)  -  z^t)]  (3.66) 

and  (3*61)  becomes 

It  Xa(t)  -  [|j  C(t)  +  c(t)  r(t)]  4(t)  -  »(t)  Hg(t)  ^(t) 

+  n(t)  [zx(t)  •  Zj(t)]  (3-67) 

Repeating  (3.W),  we  have  . 

ft  Sx(t)  -  F(t)  ^(t)  +  Pn(t)  H*(t)  R^(t) 

.  [zx(t)  -  Hx(t)  $x(t)]  +  Gg(t)  Kg(t)  $a(t) 

-  Gx(t)  Ux(t)  (3.68) 

Equations  (3.66) ,  (3*67),  and  (3*68)  taken  together  constitute  a  system 

of  n+p  first  order  differential  equations  whose  solution  gives  the  state 
A 

estimate  X^(t) .  This  result  is  intuitively  reasonable:  if  controller 
number  2  is  constrained  to  use  a  "p  dimensional"  state  estimator,  then 
controller  number  1  must  use  an  "n+p  dimensional"  state  estimator. 
Furthermore,  because  of  the  restrictive  assumptions  we  have  made,  we 
are  actually  able  to  solve  the  game  problem,  i.e.,  obtain  L^.  Thi6  is 
done  as  foll&ws:  since  we  have  assumed  Lg  *  0,  we  may  write 
Ug  ■  Xg(t)  using  (3*55),  this  may  be  written  Ug  ■  Kg(t)  C(t)  q(t); 
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and  so  (3«Ul)  becomes 


X  -  F(t)  X(t)  +  02(t)  Kg(t)  C(t)  q(t) 


-  o1(t)  ux(t) 


These  eqviations  may  be  written 


[f]-  [sk-frh 


3.3  Control  Applications  of  the  State  Estimation  Procedure 

The  original  criterion  functional  may  be  written  in  terms  of  this 


augmented  system  as 


J  »  E 


■feaT  f;  ;1  tel  -  f 

-/.’[afc  j«j [as] ■ 


(t)dt 


(3.71 


This  is  a  classical  one-sided  stochastic  optimal  control  problem  of  the 
linear-quadratic  type,  and  the  solution  is  well  known  to  be  of  the  form 


(3.72 


We  have  already  observed  that  X^  satisfies  (3*M)>  Which  may  be  com¬ 
bined  with  (3.60)  to  read 


^  -  F(t)X1(t)  +  Pn(t)H*(t)R^Z1(t)  -  H^t)  ^(t)] 
♦  02(t)  Kg(t)  C(t)  S(t)  -  O^t)  Ux(t) 


(3.73 


Furthermore,  q  is  given  as  the  solution  of  (3.66),  which  is 

i  -  r(t)$(t)  +  R(t)  Xx  +  n(t)  [Zl(t)  -  Hx  i^t)]  (3-7*0 


Behn  and  Bo  [3]  have  solved  this  problem  for  the  case  in  which 
T)^  *  0,  and  their  result  is  that 

°i  -  K  [*2]  <3.75) 

where  K  may  be  written  K  «  :  Dp^J  and  is  the  deterministic 

optimal  feedback  gain  derived  in  Chapter  1.  Since  we  may  write 
€g  •*  X  -  Cq,  Behn  and  Ho's  solution  may  also  be  written 

di  -  IX*  v  '-vOKl  <3-76) 


Then  if.  we  fix  controller  number  2's  strategy,  i.e.,  require  that  he 
continue  to  play  as  if  «  0,  the  problem  iB  simply  a  stochastic 
optimal  control  problem.  We  may  apply  the  separation  principle  to 
obtain 


[Ki*  V 


(3.77) 


Thus,  for  this  special  case  we  have  solved  the  game  problem.  The  result 
may  be  written 


U1  *  K1  *1  +  Vl  "  Dp 

•  Ki  Pi  *  h(zA>] 


(3.78) 
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Therefore,  satisfies 


-  Up  [xx  -  Cq]  (3-79) 

From  a  computational  standpoint,  such  a  requirement  is  unreasonable; 
thus,  a  game  strategy  which  incorporates  the  conditional  mean  of  the 
state,  the  conditioning  being  done  on  all  past  observations,  is  not 
satisfactory  from  an  engineering  viewpoint  unless  the  opposing  strategy 
is  known  to  be  dimensionally  restricted.  If  the  opposing  strategy  is  in 
fact  dimensionally  restricted,  the  resulting  game  situation  is 
unsymmetrical. 

0 

3*4  A  Suboptimal  Estimation  Procedure 

An  Interesting  suboptimal  state -estimation  procedure  has  been 
developed  by  Rhodes  and  Luenberger  [24],  the  significance  of  which  will 
be  shown  in  Chapter  4.  The  method  uses  state  estimates  generated  by 
differential  equations  which  are  of  the  6ame  order  as  the  controlled 
system.  This  procedure  is  a  compromise  between  estimation  error  and 

computational  difficulty.  We  have  already  developed  a  differential 

.  A 

equation  (3*44)  describing  the  conditional  mean  of  the  state: 

Xx  -  F(t)  3Cx(t)  +  Pu(t)  H*(t)  R^(t) 

.  [zx(t)  -  Hx(t)  4x(t)]  +  02  Kg  $21(t) 

(3.C0 


-  0, (t)  u,(t) 


We  recall  that  the  problem  of  dimensionality  enters  the  picture  in  the 
calculation  of  ^^(t).  As  a  simplifying  assumption,  let  us  take  X2(t) 
to  be  approximated  by  some  linear  transformation  of  &^(t);  i.e.,  let 

*21^)  *  ni(t)  *1^  (3*8 

where  fi^t)  is  to  be  chosen  according  to  some  criterion  of  optimality. 
For  reasons  which  we  shall  see  later,  a  desirable  criterion  is  mean 
square  error;  i.e.,  we  choose  0^  to  minimize 

,  f  ,A  A  .1  1  f  f  rA  ^  A  TA  A 

I  tr  |^Cov  (VVi)]  -  | tr  K  LV  niXliX2-  VlTlj 

*  i tr  [e(v2-  w*  -  v-K  *  nxixK 

r  1 

«  I  tr  ^{xgX*-  2XgX*n*  +  (3-8: 


Taking  the  gradient  with  respect  to  and  setting  the  restating 
expression  equal  to  zero,  we  have 


f  A  A*  AA»i 

xVi'Wii  -  0 


_  rAA»i  [  rA  a*i  1-1 

°i  “  El¥il  [WiTJ 


(3.81 


-  A  A 

How,  if  «.  ■  X  -  X,  and  t„  «  X  -  X0  and  we  define  the  vector  p  by 


and  let 


P 


Cov  p  *  E 


poo 

P01 

P02 

9 

P10 

P11 

P12 

_P20 

P21 

P22_ 

then 

A  A*  .  . .  .  # 

x^cx  -  (x-€2)(x-€l) 


(3.86) 


(3.87) 


■where  Cq  is  controller  number  2’s  p -dimensional  state  estimator,  which 
is  designed  with  the  assumption  that  T1  =  0.  Behn  and  Ho  have  shown 
that  under  this  assumption  p  *  n  and  C  *  I;  i.e.,  controller  number  2 
needs  only  an  n -dimensional  state  estimate,  in  this  case  generated  by 
a  Kalman  filter. 

This  problem  is  not,  however,  a  true  game  problem,  since  all  of 
the  parameters  of  controller  number  2's  strategy  are  fixed.  But,  since 
the  purpose  of  this  chapter  is  to  analyze  the  problem  of  state  estima¬ 
tion,  with  game  theoretic  considerations  suppressed  temporarily,  we 
proceed  in  that  vein. 

Because  controller  number  2's  p(=n)  dimensional  state  estimate  is 
based  on  erroneous  assumptions,  it  is  not  certain  how  good  a  state 
estimate  it  is.  Even  within  the  class  of  n-dlmensional  estimators,  it 
may  not  be  optimal  either  as  an  estimator  or  as  a  strategic  variable. 
Clearly,  from  controller  number  2's  viewpoint,  the  "p -dimensional" 
state  estimate  is  inferior  to  a  "2n  +  p -dimensional"  estimator,  which 
he  would  use  were  he  not  constrained.  Inductively,  we  conclude  that  no 
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state  estimates  generated  by  finite -ordered  differential  equations  can 
make  optimum  use  of  all  of  the  information  contained  in  the 
observations . 


The  reason  for  the  difficulty  encountered  in  making  the  state 
estimates  is  that  the  "state"  of  the  system  includes  the  "state"  of 
each  controller's  estimate.  When  a  system  is  described  by  a  differ¬ 
ential  equation,  then  its  state  estimate  is  also  described  by  a 
differential  equation  of  the  same  order.  When  a  system  is  described 
by  a  Integral  expression,  its  state  estimate  is  also  an  Integral 
expression;  and  to  compute  this  Integral,  all  past  values  of  observa¬ 
tions  must  be  retained.  So  , 

E(v‘l)  ■  pn  -  P01  -  P20  +  P21  <3.86) 

and 

E{^l^l}  *  e{(x-c1)}^*  -  e[x  £[}  -  Eje^  (3.89) 

It  is  a  property  of  optimal  estimates  [8,  pp  38-1*33  that  “  0. 

For  the  moment, we  shall  assume  this  to  be  true  for  our  estimate  also  and 
verify  the  fact  later.  Thus,  (3*89)  may  be  vritten 

eM}  ■  E{x  -  E{x<*-'1>*}  •  P00  •  P01 

Using  (3*88)  and  (3*90),  (3*84)  may  be  vritten 

ni,  "  Doc*  P0l"  P20+  P2lJ  [P00“  P0l]  1 
“  1  "  [P20"  P2l]  [P00"  P0l] 


Cr 


(3.90) 


Substituting  this  expression  into  (3.80),  we  have 


i 


« 


or 


-  P(t)  +  Pn(t)  H*(t)  R^(t)  [j^(t)  -  H^t)  5x(t)] 


+  0o(t)  K0(t) 


-  [pa>-  p2i]  D 


poo"  P01 


I1] $1 


(t) 


-  o^t)  Ux(t) 


(3.92] 


K  .  fr(t)  *  o2(t)  Kg(t)  [i  -  p21)  (Pqo-  V'1]]^ 

♦  Pn(t)  fi£(t)  R^t^t)  -  H^t)  5l(t)] 

-  Gx(t)  Ux(t)  (3.93] 


We  have  thus  derived  an  "n -ditnens ional "  state  estimator  for  controller 
number  1.  This  estimator  is  given  in  terms  of  the  covariances  P^, 

P21*  P00#  P01*  ttnd  Pll»  9V«ntities  which  must  be  calculated  separately. 
Note  that  as  -*  0,  n  -*  I. 

It  is  Impossible  to  calculate  these  covariances,  however,  unless 
we  have  some  knowledge  of  the  fora  of  We  therefore  assume  that 

$2(t)  is  an  n -dimensional  estimator  of  the  same  form  as  X^(t)  and  is 
thus  described  by  the  differential  equation 


|^X2(t)  -  F(t)  Xg(t)  ♦  PggU)  H*(t)  R^(t) 

[z2(t)  -  Hg(t)  X2(t)]  -  Q^t)  Kx(t)  X^t) 


♦  09(t)  U9(t) 


(3.9* 


Again,  ve  approximate  X^(t)  by  fl2(t)  Xg(t)  and  by  an  analogous 
manipulation  obtain 


n2(t)  -  i  -  (p10-  p12)(p00-  p oj)’1 


(3-95) 


Thus,  (3  *9*0  becomes 


■  [p(t)-°i<t)Ki(t)[I  -  (p10-  P12>'P00-  p02rl]]*2(t> 

.A 


+  P22(t)H2(t)R21(t>[Z2(t)  -  H2(t)X2(t)1  +  Gg(t)  Ug(t) 

(3.96) 

We  are  now  in  a  position  to  calculate  the  covariances.  We  begin 
with  system  equation 

|^X(t)  =  F(t)  X(t)  -  G1(t)K1(t)X1(t)  +  G2(t)K2(t)X2(t)  (3-9?) 

This  may  be  rewritten,  dropping  the  "t"  argument,  as 


X  [f  -  G.^  +  GgKgl  X  +  G1K1e1  "  g2!C26J 


(3.98) 


We  may  express  =  X  -  X^  by 


+G2X2(P20-  P2i^P00"  P01^  X 

+  [p  +  °s?ca[I  “  (p2o"p2i^poo“poi)  ]  "  piihiri1hi  €1 


-  CgjCgEp  -  p11H1R1Ln1 
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(3.99) 


Similarly, 


*2  "  *°1K1^P10"  P12^P00'  P02*  1  X  +  °1K1*1 
*[r  -  Vj[l  -  (pi0-  P12^P00’  P02rl] 

-  PggHgRg^Hg]  Cg  “  P22^2R2^2  (3*100) 

We  may  write  a  differential  equation  describing  the  vector  p- 
follows: 

p  -  Tp  +  BT]  #  (3.101) 
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Then  P  ■  Cov  (p )  satisfies  the  differential  equation 


I 


p  »  rp  +  pr  +  brb 


where 


R1  0 


0  K 


Denoting  the  subrantrlces  of  r  and  B  by 

r  - 


roo 

r01 

r02 

rio 

rn 

ri2 

r20 

ra 

r22 

B 


-B1  0 


0  -B, 


and  expanding  (3*105),  ve  have 


Roo  Roi  *02]  [ 


r  r  r 

*00  *01  *02 


p  p  p 

r00  r01  02 


(3.116) 


so  (3*112)  may  be  written 


*11  "  *io  "  rn^pn"pio^  +  ^pn"pio^rio‘roo^  ^3* 

Since 

pk.  -  *{•/}-  K* A-/}  -  +  pu  (3- 

we  nay,  by  choosing  P^O)  «  P10(°)  *  Cov  £x(0)l,  insure  that  P^t) 

«  P^Q(t)  for  all  time;  this  forces 

»W3  ■  0  for  all  time.  (3* 

This  condition  was  assumed  in  the  derivation  of  n^(t).  and  is  now 

verified.  A  parallel  development  will  show  that,  by  choosing 

p22(0)  =  P^O)  *  ^  ,we  can  guarantee  that 

o 

1*22^)  *  P^t)  for  8111  tine;  (3* 

thus, 

*{'&!  -  0  for  all  time.  (3.: 

Hote  that  (3. 119)  is  true  regardless  of  the  form  assumed  for  Xg. 

Thus  far,  we  have  assumed  a  specific  form  for  Xg.  We  will  now 
relax  this  assumption  and  assume  that  &g  is  obtained  by  an  arbitrary 
function  of  the  observed  data  Zg.  Then  (3.IOO)  becomes 

*0  •  \*  -  CMC,  +  GJC-1  X  4  G,K,€,  -  0JC*eo  -  X„  (3*122) 


and  (3.101)  becomes 


p  -  IIP  +  BT^  +  cx2 


(3.123) 


vhere  r  is  the  same  as  before  except  for  the  third  row,  which  becomes 


r20  '  j>  '  °1K1  * 


r21  =  G1K1 


r22  "  “°2K2 


(3.12^) 

(3.125) 

(3.126) 


and  where 


B 


■pnHA1 


( "i  .19^ 
«  ■>  «  • 


and 


r  n 
0 

0 

■I 


Kow  P  ■  E^pp*}  satisfies  the  differential  equation 


(3-128) 


*  _  *  #  A  *  A  * 

P  -  TP  ♦  Pr  +  BR^B  +  CXgP  +  PXgC 


(3.129) 


A  * 

Since  CXgP  is  of  the  form 


000 

[V  -Vi  -*A  | 


(3.130) 
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and  pXgC  is  of  the  fora 


K*  * 


0  0 


0  0 


0  0 


-A 


(3*13 


the  equations  for  and  P^q  are  unchanged  from  (3*110)  and  ( 3 •  1H ) > 

%tm«  equation  (3*119)  1®  valid  regardless  of  the  form  of  $2«  In  order 
to  calculate  V  however,  player  number  1  must  make  some  assumption 
about  the  form  of 

One  need  be  no  more  general  in  his  assumptions  about  the  form  of 
$2  than  to  assume  that  is  generated  by  a  2nth  order  differential 
equation,  because  from  player  number  2's  viewpoint  the  system  is 
described  by  the  set  of  differential  equations 


X  -  FX  -  G1K1X1  +  G2U2 


K  m  [,  +  «/alI  -  (p20-p2i)(poo-poi)'1)j  h 

♦  pnHlRi1  IX  -  hA]  -  °ikA 


(3.13 


and  observation  equation 


and  $  satisfies 


$  -  AY  +  KfZg-H^]  (3.1V 


and 

K  ■  P2H20R21  (3.lV 

where 

'  P2  -  Cov  (Y-$)  (3.IW 

and 

P2  -  ap2  +  p^*  »  br^b*  -  PgB^^g  ■  (3.1*.: 

Furthermore,  the  separation  principle  asserts  that  the  optimal 
control  is  given  by  Ug  ■  K^g.  This  points  up  an  important  fact  about 
the  game  problem:  if  one  player  is  constrained  to  using  an 
n-dimenslonal  control  strategy,  the  opposing  player's  unconstrained 
optimal  control  strategy,  if  it  exists,  is  no  more  than  2n -dimensional 


Chapter  1* 

THE  DIFFERENTIAL  GAME  PROBLEM  WITH  DIMENSIONALLY 
CONSTRAINED  CONTROL  STRATEGIES 


fc.l  Introduction 

In  Chapter  3  it  was  shown  that  when  the  two  controllers  were  not 
constrained  dimensionally  they  could  not  generate  the  conditional  mean 
of  the  state  with  finite  dimensional  computing  methods.  In  Chapter  2 
it  was  shown  that  the  optimal  linear  strategies  can  he  written  in  terms 
of  a  conditional  mean  of  the  state  plus  some  additional  terms.  It 
would  thus  appear  that  an  overall  optimal  linear  control  strategy  could 
not  be  generated  unless  the  controller  retains  all  his  past  observa¬ 
tions  for  use  in  computing  the  control.  In  many  real  engineering 
situations,  however,  such  a  requirement  may  not  be  practically  met. 
Thus,  we  may  wish  to  specify  control  strategies  which  are,  first, 
computationally  practical  and,  second,  optimal  within  the  class  of 
strategies  satisfying  whatever  computational  efficiency  criterion  we 
select. 

h.2  The  Dimensionality  Constraint 

We  shall  examine  here  the  nature  of  control  strategies  which  are 
optimal  within  the  class  of  strategies  which  can  be  written  in  the  form 


Ux(t) 

-  Kx(t)  5x(t) 

(*.l) 

U2*t) 

•  Kg(t)  Xg(t) 

(*.2) 
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where  X^(t-)  and  Xg(t)  are  in  some  sense  n-dimensional  "estimates"  of  the 


state  which  satisfy  the  differential  equations 


K  *  (Ai^iKi)  +  i  ^(0)  =  X0  (M) 


2  =  (Ag+GgKg)  Xg  +  Bg(Zg-HgXg)  ;  Xg(0)  =  Xq 


(*.*> 


where  K^(t),  A^(t),  and  B^t),  i  =  1,2,  are  unspecified  and  must  be 
chosen  in  a  manner  which  will  optimize  the  criterion  functional.  A 
restriction  of  this  problem  which  we  may  also  wish  to  consider  is  that 
in  which  part  of  the  parameters  are  specified  and  only  the  remaining 
unspecified  quantities  must  be  selected. 

This  approach  has  been  considered  in  problems  of  both  state  estima¬ 
tion  and  stochastic  optimal  control  [l4] .  In  these  cases  its  appeal  is 
in  its  potential  as  a  computationally  efficient  suboptimal  estimation/ 
control  scheme.  In  the  two-input  situation  the  dimensionality  con¬ 
straint  appears  to  be  motivated  more  by  necessity  than  by  mere  economy. 

1».3  A  Specialized  Relationship 

Bhodes  and  Luenberger  [23]  have  taken  the  above  approach  to  a 
problem  closely  related  to  the  one  under  consideration  here  and  have 
derived  the  following  result,  presented  here  without  proof. 

Theorem  U.3a 

t 

For  the  stochastic  differential  game  problem  described  by  (1.1A), 
(1.2A),  and  (1.3A)  with  controls  given  by 


where  and  Kg  are  given  by  (1.52)  and  (1.53),  respectively,  and 


satisfy  equations  of  the  form  (4.3)  and  (4.4),  with 

A1 

■  p  ‘  °iKi  *  wl1  -  (I’ao-p2i)<poo-poi)"1] 

(4.7) 

A2 

-  P  -  -  <P10-P12><P00-P02)'1]  *  V-2 

(4.8) 

B1 

-  piiHiRI1 

(4.9) 

B2 

■  P22  *2  R21 

(4.10 

and  with  as  defined  in  Chapter  3,  the  following  inequalities  hold: 

e{j(u°,  u®)|5J  s  E{j(ur  u°)iy  (4.n 

E{j(U°,  U°)  I Xg}  a  E{j(U°,  U2) (4.12 


This  result  appears  to  be  stronger  than  it  is:  (4.11)  and  (4.12) 
merely  say  that  if  the  state  estimate  derived  in  Chapter  3  is  used  then 
the  control  strategy  which  optimizes  the  conditional  expected  value  of 
the  payoff  functional  is  the  certainty -equivalent  strategy  when  the 
conditioning  is  done  on  the  value  of  the  state  estimate.  Equations 
(4.11)  and  (4|#12)  do  not  imply  that  the  certainty -equivalent  strategy 
optimizes  the  conditional  expected  value  of  the  payoff  when  the 


conditioning  is  done  on  all  past  observations,  nor  do  they  say  anything 
about  the  overall  (unconditional)  expected  value. 


4.4  Generalized  Relationships  for  n -Dimensional  Control  Strat 


.  We  wish  to  derive  some  necessary  conditions  for  control  strategies 
of  the  fora  described  by  (4.1)  through  (4.4)  to  satisfy  the  following 
saddle  point  conditions: 


e{j(u°,  vp)  *  e{j(u1#  U°)} 
u°)}  *  e{j(u°,  u2)} 


(4.13] 


(4.14] 


In  order  to  put  the  problem  in  a  format  more  suitable  to  our 
needs,  we  shall  reformulate  it  somewhat.  First,  we  define  the  state 
estimation  errors  end  -g  by 


«  X  -  3^ 


(4.15) 


*2  “  X  “  X2 


(*.16) 


where  ^  and  $2  are  generated  by  estimators  of  the  form  (4.3)  and  (4.4). 
Then,  using  (4.1),  (4.2),  (4.15),  and  (4.l6),  the  system  equation  (1.1A) 


nay  be  rewritten  as 


X  -  (F-0^  ♦  GgJCg)  X  +  01K1«1  -  OgKgCg 


(1».17) 


Combining  (4.17)  with  (4.3),  (4.4),  (4.15),  (4.16)  and  (1.3A)»  we  see 
« 

that  the  estimation  errors  c.  and  e-  satisfy 


*1  ■  (F^+O^JX  +  (A1-B1H1)€1  -  GgKgCg  -  B^  ( 

*2  “  (F"A2~GlKl)X  +  CjKi*!  +  (Ag-BgHgJeg  -  BgT|2  ( 

Therefore,  (4.17),  (4.18),  and  (4.19)  taken  together  may  be  written 

p  -  Tp  +  BT)  ( 


where 


X 

6, 


2J 


d 


1 

LYI 


r  r  r  i 
‘oo  *01  *02  i 


r  r  r 
10  *u  12 


[r20  r21  r22 J 


P-°1K1  +  %  |  G1K1 


F-Ai  +  GgKg  j  Ax  -  BlHl  -G^g 


™2  -  0XK± 


G1K1 


A2  "  BJ 


B  - 


B1  ° 


B„ 


Bote  that  the  quantity  -  UgUg  nay  be  written  in  terms  of  the 
vector  p  as 


Q  - 


K,K,  -  K^C2  I  -K^  i  KgK2 


11 


<h 


:  hh ; 0 


i  o 


*2*2 


(*. 


Note  also  that  X  (T)X(T)  nay  be  vrltten  in  terms  of  the  vector  p  as 
X*(T)X(T)  -  p*(T)y>{T)  .  (4. 


where 


I 

0 

0 


0  0 

0  0 

0  0. 


(4. 


In  view  of  these  relationships,  we  nay  write  the  payoff  functional  as 

T 


J  -  e|p*(T)Q^p(T)  +  J  P*(t )Q(t )p (t )dT^ 


■  tr 


rT 

e{p(T)  p*(T)  +  J  P  (t  )p*(t  )Q(t )d.T^ 


(4.1 


And,  defining  P(t)  by 


P(t)  -  E[p(t)p*(t)j 


P00  P01  P02 


P  P  P 
r10  rll  r12 


lP20  P21  P22J 


(4.! 


(4.28)  nay  be  written 


ir 


P(  TXL  +  \  P(T)Q(r)dT 


(*».: 


Thus,  the  stochastic  game  problem  has  been  converted  to  a  deterministic 
game  to  which  classical  deterministic  optimal  control  techniques  may  be 
applied. 

In  applying  these  classical  techniques,  we  first  note  that  the 
matrix  P(t)  satisfies  the  differential  equation 


P(t)  -  T(t)  P(t)  +  P(t)  T*(t)  +  B(t)  R(t)  B*(t)  (4.31) 


where 


R(t) 


\(t)  0 

_°  R2(t). 


(4.32) 


We  then  form  the  Hamiltonian  corresponding  to  the  payoff  functional 
(4.30)  and  the  differential  equation  constraint  (4.31),  which  is 

H(A1,A2,B1,B2,K1,K2)  *  -tr  [pq]  +  tr  fx(rP+Pr*  +BRB*)]  (4.33) 

where  X  is  a  Lagrange  multiplier  matrix,  which  satisfies  the  canonical 
Euler-Lag range  equation 


x  -  - -  -xr  -  r*x  +  Q 

X(T)  -  ^  (4.34) 


where  the  gradient  operation  is  as  defined  in  Chapter  1  and  thus 


(4.35) 
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According  to  the  Maximum  Principle,  we  wish  to  select  A^,  B^,  and  K1 
so  as  to  minimize  B  and  to  select  Ag,  Bg,  and  Kg  so  as  to  maximize  H. 
We  shall  see  that  the  order  of  maximization  and  minimization  does  not 
matter.  Since  A^,  Ag,  B^}  Bg,  K^,  and  Kg  are  incorporated  in  various 
auhmatrices  of  r,  G,  and  Q,  we  may  partition  the  expression  for  the 
Hamiltonian  in  order  to  isolate  those  submatrlces  of  interest  for 
optimization  with  respect  to  a  particular  quantity.  Thus,  since  the 
matrix  B^  appears  as  a  part  of  F  and  G,  we  may  write  aB  a  necessary 


condition 


-  g|-  tr  [irP  +  XPr*  +  XBRB*"|  -  g|-  tr  [zPXr  +  XBKB*1 

1  ~  1  »  \ 

* 

fo  o  o']  To  o  o"l 

-  g|-  tr  2PX  0  -B^  0  +  X  0  B-^H^B*  0  «  0 

1  0  0  OJ  [o  0  Oj  (4.3 


It  is  convenient  at  this  point  to  partition  the  P  and  X  matrices 


P  - 


P1  1  X  *  [X0  I  X1  I  Xg] 


Here  PQ,  P^  and  Pg  are  n  x  3n  matrices,  and  XQ,  A.,  and  Xg  are  3n  x  n 
matrices .  These  matrices  may  be  further  partitioned  when  convenient; 


X1* " 


[pio;pn;pi2*] 


(4.31 


where  the  X^  and  ,  i  »  0,1,2,  axe  n  x  n  matrices. 

Using  this  notation,  ve  may  write  (4.36)  as 

tr[‘2PlX1B1Hl  +  XllBlRlBl] 

•  -  -2H1P1X1  +  2R1B*X11  -  0 

or 

X11B1  “  ^lVl1  (M9) 

Equation  (4.39)  is  a  necessary  condition  for  minimization  with  respect 
to  the  matrix  B^.  Completely  analogous  arguments  regarding  the  matrix 
Bg  lead  to  the  expression  • 


*  *  #  *  -1 
X22®2  *  X2?2H2R2 


(4.40) 


The  Hamiltonian  (4.33)  is  also  quadratic  in  K^,  so  we  write 

q-  «  gjq  tr^-PQ  +  xrp  +  XPf*j  =  tr  -PQ  +  2Pxrj  =  0 


SK, 


(4.41) 


Equation  (4.4l)  may  also  be  written 


3K, 


tr 


f 

Ah 

Ai 

o' 

'-°A 

G1K1 

0 

\ 

-p 

Ai 

A, 

0 

+  2PX 

0 

0 

0 

h 

0 

» 

0 

0 

4 

•°1K1 

G1K1 

0 

4 

4 

(4.42) 
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IW; 


Nov  note  that  ve  nay  vrite 


*  _ « 


*1*1  "*1*1  0 
tr  P  -K*^  K*KX  0 


0  0  0 


-  tr  (P^Pq)  I  (4.1*3; 


where  and  PQ  are  as  defined  In  (4.37).  Also  note  that  ve  may  write 


*lh  GiKi  0 

tr  PX  0  0  0  «  tr  [(P^oKVV  GlKl] 


l-G^  G1K1  0 


Substituting  (4.43)  and  (4.44)  into  (4.42),  ve  have 

r-n 

r|-tr  -fP.-Pr)  I  K*K.  +  2(P,-Pj(X„+XjG,K,  » 

OK.  XU  XX  XVVC-J.X 

LoJ 


^(P^)  I  Kx  +  2(P1^0)(X0+X2)  Gx  -  0  (4.45) 


(Px-P0)  (X0+Xg)G1  -  I  Kx  -  0 

oj 


(4.46) 


Again,  analogous  arguments  apply  to  the  feedback  matrix  Kg  and  produce 
the  expression 


(Po’-Pj  (Xrt+X,  )G0  -  [  i  |  K*  - 


(4.47) 


As  opposed  to  the  case  for  the  B  and  K  matrices,  the  Hamiltonian 


is  linear  in  the  A  matrices;  consequently,  the  maximum  principle 
dictates  in  the  case  of  the  minimizing  matrix 


a  , 

>  0, 

aliJ  * 

‘  aUJ 

max 

aaliJ 

<  0, 

* 

*uj  a 

aUJ 

mln 

am 

*112 

-  - 

alla 

A1  - 

*121 

*122 

-  - 

*12n 

alnl 

*ln2 

— 

*lnn 

Por  matrix  Ag, 

3H  j  >  °»  *2iJ  *  *2iJ  mln 

aa2iJ  |  <  0,  a2iJ  *  a2iJ  max 


(4.1*8) 


(4.1*9) 


fh  Qf\\ 


and  an  expression  analogous  to  (4.49)  defines  the  elements  of  Ag. 

Singular  cases  exist  vhere  neither  inequality  is  satisfied  in 
(4.48)  or  (4.50),  i.e.,  where  the  derivative  is  equal  to  zero.  In 
such  cases,  if  the  condition  can  be  sustained,  some  higher-order  test, 
such  as  the  Kelley  necessary  condition  [19],  may  be  applied  in  an 
attempt  to  determine  the  values  of  the  elements.  It  will  now  be  shown 
that,  if  the  necessary  conditions  for  and  are  satisfied,  the 
entire  trajectory  lies  on  a  singular  surface  for  A^. 


4.5  A  Singular  Surface 

It  was  mentioned  In  section  4 . 2  that  In  some  restrictions  of  the 
game  problem  there  defined  some  of  the  parameters  might  be  specified 
and  thus  not  available  for  optimization.  We  shall  see  that  it  is  only 
under  these  conditions  that  the  optimal  A  coefficient  matrices  would 
be  chosen  by  (4.48)  or  (4.50),  i.e.,  be  bang-bang.  Otherwise,  the 
gradient  of  the  Hamiltonian  with  respect  to  the  A  matrices  is  zero 
during  the  entire  interval  [0,  Tj.  This  is  shown  for  the  case  of  the 
A^  matrix  as  follows: 


’ 

'0 

0 

0‘ 

tr 

2PX 

■*1 

A1 

•o 

h 

.0 

0 

0. 

4 

2*a  tr  [(P^ -Pn)X1A1  ]  •  2(P1-Pn)x1 


(4.51 


at 


From  the  boundary  condition  given  in  (4.34),  we  see  that  X^ 
t  •  T  and,  thus,  a  singular  condition  exists  at  the  boundary. *”  We  shall 
now  show  that 


It  (VP0)X1  S  0  0<tsI  (4.52] 

whenever  the  optimality  conditions  (4.39)  and  (4.46)  for  and  K1# 
respectively,  are  satisfied. 

Consider 


dt 


<V +  (Mn)X 


(4.53) 
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Now  note  that  «  -G^Cg  ;  therefore, 


ri2  "  r02  *  0 


(*.58) 


Also  note  that 


rio  "  roo  “  ^rn‘roi)  '  BiHi 


(*.59) 


Substituting  (*.58)  and  (*.59)  into  (*.57)>  we  have 


pi  -  po  -  (prp0)r*  *  <rirroi)(prpo)  •  B1H1PC 

+  [°  VA  °] 


Therefore, 


VWA1  e  <VVl  Ai  +  (in‘roi'n'*i‘VAi 


(*.60) 


’  BlWl  +  [°  BlRlBl  °^1  (*-61) 


Using  equation  (*.39)»  we  see  that  the  last  term  of  equation  (k.6l)  may 


be  written 


[°  W*  °)i  "  Wih 


(I..62) 


bo  (*.6l)  becomes 


(Vpo)xi  -  <pi-p0)r*xi *  <rirroi)(prp0)xi 


(1..63) 
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Adding  (4.55)  and  (4.63)  and  combining  terms,  ve  have 

It  (*1*0*1  •  -<Vpo)xiru  *  (rn'roi  *  B1H1)(P1-P0)X1 

(4.64 

Since  this  equation  is  linear  in  (P^-Pq)X  and  is  homogeneous,  and  since 
(P^-Pq)!^  »  0  at  t  =  T,  ve  must  have 


-  (P^P^  =  0  Ostsi 


(4.65 


Similar  relations  apply  for  £7—  •  Thus,  if  ve  choose  the  B  and  K 
matrices  on  the  basis  of  the  maximum  principle,  ve  must  look  beyond  the 
maximum  principle  for  help  in  specifying  the  elements  of  the  A  matrices 


Equation  (4.65)  indicates  that,  if  the  optimal  values  for  and 
are  employed,  the  state  trajectory  lies  in  a  surface  in  state -space  on 
vhich  the  Hamiltonian  is  first-order  Independent  of  variations  in  A^; 
an  analogous  condition  exists  vith  regard  to  Ag.  In  seme  control  situa¬ 
tions  of  this  type,  ve  may  make  use  of  higher-order  necessary  conditions 
on  variations  of  A^  or  Ag.  A  veil -known  second-order  necessary  condi¬ 
tion  is  the  Legendre  necessary  condition,  expressed  as 


(4.66) 


*Thii 


his  section  is  based  in  part  on  material  presented  by  Johansen  [14] . 


*.  *  *  *.>« 


.V/’' 


v  rC'-v  v'  *v" 


This  condition  is,  of  course,  trivially  satisfied  for  the  game  problem 
under  consideration  here  because  of  (It. 65).  In  cases  where  (It. 66) 
obtains  with  equality,  another  necessary  condition,  the  Kelley  necessary 
condition,  is  sometimes  applied.  This  condition  is  expressed  as 


d2*  an 

at2lc  *Ai 


\  £  0  k-0,1,2,., 


(4.67) 


However,  since  the  differential  equation  describing  is  seen  to  be 
linear  in  57—  and  is  homogeneous,  and  since  is  zero  on  the  boundary, 

OA.  oA. 

SB  * 

all  time  derivatives  of  are  zero  on  the  singular  surface  and  (4.67) 
is  also  trivially  satisfied. 


The  reason  for  the  apparent  paradox  is  that  the  problem  has  been 
given  too  may  degrees  of  freedom:  if  the  B  and  K  matrices  are  chosen 
optimally,  the  payoff  is  actually  Independent  of  the  A  matrices.  ThlB 
aspect  of  the  problem  is  related  to  the  non-uniqueness  of  optimal 
control  strategies  of  the  form  given  by  (4.1)  through  (4.4).  As  an 
illustration  of  this  non -unique  characteristic,  we  may  consider  the 
strategy  of  controller  number  1,  which  may  be  written  in  the  form 


*1  ^1 


*1  *  (Al’*lKl"BlHl)*l  +  B1Z1  ;  ^1(0)  “  *0 


(4.68) 

(4.69) 


Assume  for  the  moment  that  A^,  B^,  and  K^  have  been  specified.  As  a 
preliminary  step,  for  notatlonal  convenience,  we  shall  define  a  new 
matrix  A^ 


A0  “  A1  ‘  C1K1  ‘  BlHi 


(4.70) 
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Then  (4*68)  and  (4.69)  become 


A 

*1*1 


$1  -  +  B1Z1;  ^(0)  -  XQ 


(4.72 


We  shall  now  show  that  ve  may  arbitrarily  change  AQ  to  a  new  matrix  A 
and  that,  by  adjusting  the  matrices  and  K^,  we  can  obtain  the  same 
control  strategy  U^. 

A* 

We  first  define  a  new  variable  X1  by 


A  i  A 


(4.73 


where  D  is  a  differentiable  nonsingular  matrix  to  be  specified.  Then 
(4.71)  and  (4.72)  may  be  written 

U1  *  K1D"A  (4/ 


Xx  -  (DfEA0) D"1  +  DB^  i  V(0)  =  D(0)  XQ  (4.75 


We  then  adjust  the  matrices  and  by  the  relationships 


Bx  »  EB1 


(4.76 


Wext  we  choose  the  matrix  D,  requiring  that 


This  nay  be  done  by  defining  two  matrices  ^  and  which  satisfy  the 
differential  equations 

♦l  -  \  ^  V0)  "  1  <*»•' 

♦2  “  “$2  A0  4>2(o)  "  1  <*•< 

Then,  by  direct  substitution  into  (4.78),  we  verify  that 

D  -  ^  Dq  *2  (l».8l 

where  DQ  is  a  nonsingular  constant  matrix,  which  we  may  choose  in  such 
a  manner  that  * 

X^O)  -  (4-82 

Then  from  (4.75)  we  infer  that 

*0  -  I  (*.83 

Thus,  the  control  strategy  may  be  written 

Ux  -  Bx$x  (4.81. 

ix  +  Bx  Zx  j  $x(0)  -  X0  (4.85 


We  conclude  that  only  in  cases  where  special  restrictions  apply  to  the 
font  of  the  B  or  K  matrices  are  we  unable  to  arbitrarily  specify  the 
K  matrices. 


This  is  not  to  say  that  specification  of  particular  values  for  the 
A  matrices  can  he  done  without  concern  over  the  implications,  since 
fixing  these  values  also  fixes  the  values  of  the  B  and  K  matrices  and 
may  lead  to  excessively  large  or  impractical  values  for  them.  In  some 
cases,  careful  selection  of  the  A  matrices  can  lead  to  considerable 
simplication  of  the  computation  leading  to  the  B  and  K  matrices.  A 
case  in  point  is  the  one-sided  problem. 

4.7  Relationships  with  the  One-Sided  Case  and  the  Separation  Principle 


When  examining  (4.30)  in  detail,  one  observes  a  certain  similarity 
between  it  and  the  expression  for  the  Kalman  filter  gain,  which  is 


(4.86) 


Upon  expanding. 


*  *  *  _  *  ,  *  * 

X1P 1  *  X01  P10  +  X11  P11  +  X21  P12 


so  (4.39)  may  be  written 


XUB 


1  ■  t 


*  «  #  *  *  _  # 
X01  P10  +  X11  P11  +  X21  P12 


:]  h  R1 


(4.87) 


Then,  if  XQ*  +  Xg*  pjj  =  0,  (4.87)  would  be  satisfied  by  (4.86). 

By  examining  the  differential  equations  describing  X0^  P10  and  P ^g> 

however,  it  can  be  seen  that  their  sura  is  not  Identically  zero, 

O  <  t  <  T.  This  is  on  example  of  the  non -separability  of  the  problem: 
the  filter  gain  B,  depends  explicitly  on  the  elements  of  the  X  matrix, 


which  in  turn  depend  on  the  feedback  gain  K^. 
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A  similar  situation  is  encountered  when  we  examine 
describing  K^.  If  a  value  for  could  be  found  satisfying 


(Xq+^Gi 


i  k: 


(4.88) 


this  value  would  also  satisfy  (4.46)  and  would  be  explicitly  independent 
of  the  elements  of  the  P  matrix.  For  (4.88)  to  be  satisfied,  however, 

it  would  be  necessaxy  that  (Xgo*  *22^G1  *  an^  this  cond^^on  not 
generally  true.  Another  condition  which  would  render  independent  of 

the  P  matrix  is 


(pl 2“  P02^X20+  *22^G1  "  ° 


(4.89) 


Again,  however,  examination  of  the  differential  equations  describing 
P  and  X  shows  that  (4.89)  i®  not  generally  true. 

Therefore,  as  a  result  of  the  above  situations,  the  solutions  to 
(4.39)  and  (4.46)  are 

*i  '  pnHA1  *  hi1  [lo!  pi2  *  XA  pw]  ”1  (l'-90) 

*!  .  o*  ( v  *,)*  <V  V*  [poo-  pio  *  pu"  poi]  (1,-91) 

assuming  that  the  indicated  inverses  exist. 

JRw  notice  that,  since  we  may  choose  the  A  matrix  arbitrarily#  as 
shown  in  section  4.6,  a  particularly  good  choice  is  A  ■  F,  which,  as 


can  be  easily  demonstrated,  results  in 


P01  "  PU  "  P10 


(4.92) 
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i»e»,  the  estimation  error  *1  is  uncorrelated  vlth  the  extimate  x  . 
Because  of  (4.92),  ve  may  make  use  of  relationship  (h.65)  in  a  special 
way:  for  the  one-sided  case  ve  may  discard  the  variables  with  "2" 
subscripts;  therefore,  remembering  that  P^-  PQ1  ■  0,  (h.65)  may  be 
written 


(pi“  p0)xx 


(P10“  P00^l  “  0 


(4.93] 


Then,  because  of  (4.93),ve  may  vrite 


P10  X01 


pio*pio“  poo>  1  <pio-  P00^X01  “» 0 


(4.94) 


and  thus  (4.90)  becomes 

B1  *  P11  H1  *£' 


(4.95) 


i.e.,  the  expression  for  the  filter  gain  becomes  explicitly  independent 
of  the  \  matrix. 


81mllar  things  happen  to  equation  (4.91)  vhen  (4.92)  is  satisfied. 
First,  (4.46)  becomes 


< 

[pio-  poo  °] 

XCX) 

,X10. 

°1- 

1  1 

A 

4 

(4.96) 


This  may  be  written  as 


(phT  P00^  [X00°l  +  K1  ] 


(*.97) 
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which  will  be  satisfied  when 


Ki  -  -G*Xoo  (*.98) 

This  is  the  deterministically  optimal  feedback  gain,  as  can  be  seen  from 
the  fact  that  the  matrix  Xqq  satisfies  the  differential  equation 

*oo  "  “(*oo*oo+  *01^10^  ”  ^ *ooroo+  *01**10^  +  K1K1  (**99) 

Since  we  have  chosen  A  =  T,  T1Q  *  0,  so  (4.99)  reduces  to 

*00  *  "*ooroo  “  ^*ooroo^  +  ^1^1  (4.100) 

Substituting  (4.98)  into  (4.100)  and  remembering  that.^  -  F  -  GjK., 
we  have 

*nn  *  "*oo  *  "  p  *00  “  *00  °1G1  *00  (4.101) 

Xqo(T)  -  I 

This  matrix  Ricattl  equation  is  the  same  as  that  satisfied  by 
(T,t)  j^I+TjT*j  1  $(T,t) 

showing  that  (4. $8)  is  identical  to  (1.52)  and  is  thus  the  determin¬ 
istically  optimal  feedback  gain  for  the  one-sided  case. 

4.8  The  Matrices  Bg,  and  Kg 

The  Hamiltonian  is  quadratic  in  B^,  Bg,  K^,  and  Kg,  and  it  is  thus 
« 

possible  to  obtain  explicit  expressions  for  these  matrices  in  terms  of 
the  elements  of  the  P  and  X  matrices.  This  has  been  done  in  (4.90)  and 


(4.91)  for  B1  and  K^,  respectively;  sinllar  expressions  may  be  obtained 
for  Bg  and  Kg.  Wien  these  expressions  are  substituted  into  the  T  matrix 
in  equations  (4.31)  and  (4.34),  these  tvo  equations  constitute  a  non¬ 
linear  two -point  boundary  value  problem.  Since  both  P  and  X  are 
symmetric  and  3n  x  3n,  the  total  cumber  of  variables  is  3n(3n+l).  For 
the  simplest  non-trivial  example,  n  *  1;  this  implies  that  the  nonlinear 
problem  has  twelve  variables. 

Solution  of  nonlinear  tvo -point  boundary  value  problems  by 
iterative  computational  methods  is  a  subject  covered  fairly  well  in  the 
literature  [2,10,16,20]  and  will  not  be  discussed  here  in  any  detail. 
However,  when  such  problems  arise  out  of  differential* games,  two 
important  aspects  must  be  considered.  The  first  of  these  is  the  number 
of  variables  involved,  large  even  by  optimal  control  standards.  Whereas 
a  one-sided  stochastic  optimal  control  problem  with  n  ■  1  involves 
solution  for  two  variables,  the  two-player  case  of  the  same  dimension 
involves  solution  for  twelve  variables.  The  Becond  aspect  is  the 
particular  nature  of  the  nonlinear  equations:  specifically,  if  the 
elements  of  the  r,  0,  and  Q  matrices  in  (4.31)  and  (4.34)  were  known, 
these  equations  would  be  linear  differential  equations  with  one-sided 
boundary  conditions.  This  fact  suggests  a  fairly  simple  iterative 
computational  scheme: 

i)  Choose  an  initial  6et  of  values  for  P(t)  and  \(t). 

li)  On  the  basis  of  (l),  compute  the  values  of  the  elements 
'  of  r(t),  C(t),  and  Q(t). 


Hi)  Using  the  values  computed  in  (ii),  solve  (4.31)  and 
(4.34)  as  linear  equations  with  one-sided  boundary 
values. 

iv)  Using  the  solution  obtained  in  (iii),  update  the 
calculations  done  in  (ii). 

v)  Repeat  until  solution  converges. 

Convergence  in  step  (v)  is  not  guaranteed,  of  course,  and  depends  on 
an  Intelligent  choice  of  initial  values  in  step  (i)  as  well  as 
fortuitous  conditioning  of  the  equations  by  the  physical  parameters 
of  the  system  and  by  a  proper  choice  of  the  A  matrices. 

As  an  alternative  to  solving  the  nonlinear  problem,  we  may  consider 
a  direct  approach  to  optimization  by  some  gradient  technique;  however. 
It, would  seem  that  the  convergence  difficulties  inherent  in  gradient 
computational  solutions  of  one-sided  optimal  control  problems  would  be 
Increased  enormously  when  two  sets  of  variables  are  Involved,  one  set 
minimizing  and  the  other  maximizing.  Thus,  it  appears  that  the  indirect 
approach  to  differential  game  problems  described  in  this  chapter  is,  at 
least  in  some  situations,  the  most  promising  method. 


» 
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Chapter  5 

OBTAINING  PAYOFF  BOUNDS  FOR  CONSTRAINED  STRATEGIES 

5*1  Removing  Constraints  on  One  Controller 

In  Chapter  1*  it  va6  indicated  that  the  optimal  coefficient 
matrices  could  be  obtained  by  solving  a  nonlinear  differential  equation 
with  split  boundary  conditions  whose  order  is  3n(3n+l) .  It  was  also 
pointed  out  that  the  computational  difficulties  of  doing  so  are 
potentially  great.  It  is  thus  the  natural  question  to  ask  what  is 
obtained  in  return  for  the  effort  required  to  solve  the  nonlinear 
problem,  particularly  in  view  of  the  fact  that  the  solutions  obtained 
give  only  control  functionals  which  are  optimal  within  a  certain, 
somewhat  artificial  constraint. 

Fortunately,  this  question  is  easier  to  answer  than  is  that  which 
inquires  as  to  the  optimal  control  itself.  Once  the  constrained  prob¬ 
lem  of  Chapter  h  is  solved,  the  solution  so  obtained  may  be  evaluated 
by  either  player  by  comparing  the  payoff  under  the  constrained  solution 
to  the  payoff  which  would  result  should  his  opponent  be  unconstrained. 
This  comparison  is  easily  made,  since  the  separation  principle  tells  us 
that  if  one  controller  uses  a  set  n-dimensional  control -generating 
system,  his  opponent's  optimal  opposing  strategy  is  generated  by  a 
2nth  order  differential  equation. 

This  fact  allows  either  controller,  once  ho  has  established  the 
form  of  his  control -generating  system  and  its  parameters,  to  obtain  a 
worst-case  bound  on  the  payoff  when  he  employs  that  strategy.  He  is 
not  able  to  obtain  a  best -case  bound,  because  the  best -case  payoff 
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depends  upon  hov  poorly  the  opponent  chooses  his  strategy  and  may  be 
unbounded.  He  is  able  to  solve,  of  course,  for  the  payoff  when  his 
opponent  uses  an  optimal  constrained  control. 

5.2  Obtaining  Worst -Case  Bounds  on  Payoff 

When  one  player,  say  number  1,  specifies  the  parameters  of  his 
n-dimensional  control,  the  system  from  player  number  2's  viewpoint 
may  be  described  by  the  2n -dimensional  system  of  equations 


P-01K1  G1K1 


F-A, 


A-B1H1 


L  -J  L  *J 


V 


°*lu 

°2  ;  2 


Player  number  2's  observation  equation  remains 

Zp  B  HpXp  +  T)p 

which  may  be  rewritten 


h2  ;  0 


rx 

ieu 


Thus,  the  payoff  functional  may  be  rewritten  as 


(5-1) 


(5.2) 


(5.3) 


J(U0) 


E< 


[x*(T)  e*(T)]  [I  Ol  [X(T)  ]+  j[  |[x*(t),c*(t)] 


[0  oj  U^T)]  °  v 


*K1K1  -KIKl‘ 

rx(T)  1 

-  u*(T)u2(T)\ar  , 

«  * 

1 

« 

l 

J 

(5.»0 


102 


Equations  (5.1)  through  (5**0  constitute  a  standard  one-sided 
stochastic  optimal  control  problem,  the  solution  to  which  is  given  by 


u|  "  -o*(t)s(t)x(t) 

where  the  matrix  S(t)  satisfies  the  differential  equation 


(5. 


S  -  SP0  -  F*S  +  SGQG*S  -  A  (5 

S(T)  «  Qy 

and  where  x  satisfies  the  differential  equation  • 

$  •  [»„  -  °o°os>  *  >C'2  -  H/l  (5.: 


where 


K 


1 


(5J 


where  P(<  satisfies 


cc 


FP  +PF+PH* kA  P  +  B  R,B* 

o  ce  «  o  «c  o  2  o  u  o  1  o 


(5.9) 


and  vhere  we  make  the  Identification 


and  and  Rg  are  noise  covariances  as  defined  previously. 

The  worst  case  bound  is  then  obtained  by  inserting  the  optimal 
opposing  strategy  given  by  (5.5)  through  (5.10)  into  the  functional 
(5«^)  and  evaluating  it. 

An  Interesting  parallel  to  the  development  of  Chapter  U  is  the 
problem  of  choosing  an  optimum  control  strategy  of  the  form 


(5.11 


(5.12 


and  vhere  controller  number  2  is  unconstrained  and (therefore, uses  a 
strategy  of  the  form  (5-5)  through  (5-10). 


Chapter  6 
CONCLUSION 


6.1  Summary 

In  Chapters  1  and  2  a  general  stochastic  differential  game 
characterized  by  linear  different led  equations,  a  quadratic  cost  func¬ 
tional,  and  additive  white  Gaussian  observation  noise  vas  presented. 
It  vas  shown  that  the  certainty -equivalence  principle,  valid  for  one- 
player  game  situations,  vas  not  correct  for  two-player  problems. 
Specifically,  if  one  assumed  a  control  form  consisting  of  a  matrix 
transformation  of  the  conditional  mean  plus  a  linear .operation  on  the 
residuals,  the  matrix  transformation  vas  the  deterministic  optimal 
feedback  gain;  however,  the  linear  operation  on  the  residuals  vas  not 
a  zero  operation,  as  vas  true  in  the  one-sided  case. 

In  Chapter  3  it  vas  shown  that  in  order  to  generate  the  condi¬ 
tional  mean  of  the  state  vector,  each  player  was  required  to  store  all 
past  observations.  However,  since  this  was  considered  to  be  an 
impractical  requirement  for  many  practical  systems,  a  state  estimation 
scheme  was  developed  which  generated  the  estimate  as  a  solution  to  a 
differential  equation  forced  by  the  observations.  The  order  of  this 
differential  equation  vas  that  of  the  controlled  system. 

Chapter  U  generalized  this  concept  to  that  of  optimal  control 
strategies  vithin  the  class  of  strategies  generated  as  solutions  to 
differential,  equations  forced  hy  the  observations .  The  order  of  these 
differential  equations  vas  taken  to  be  that  of  the  controlled  system. 
This  approach  resulted  in  expressions  for  the  control  strategies  given 


in  terms  of  functions  vhlch  are  known  only  as  solutions  to  a  set  of 
nonlinear  differential  equations  with  split  boundary  conditions.  A 
computational  approach  to  solving  these  equations  is  suggested. 

In  Chapter  5  it  was  pointed  out  that, once  a  set  of  dimensionally 
constrained  strategies  is  calculated,  either  player  may  compute  a  worst - 
case  bound  on  the  payoff  by  assuming  his  opponent  uses  an  unconstrained, 
and  therefore  higher-dimensional,  strategy.  Formulas  are  given  for 
computing  this  bound. 

6.2  Results  of  Research 

Optimal  dimensionally-constrained  control  strategies  are  of 
Interest  in  practical  problems  where  computational  capacity  is  limited. 
A  great  deal  of  Importance  in  choosing  a  control  strategy  is  bound  up 
in  the  question  of  what  one  Is  willing  to  assume  about  his  opponent's 
strategy.  Computation  of  an  optimal  unconstrained  but  linear  strategy 
is  quite  complicated,  and  so  it  is  reasonable  to  assume  that  one's 
opponent  will  Impose  some  complexity  constraint  upon  himself.  As  we 
have  seen  in  Chapter  k,  there  are  various  ways  in  which  such  constraints 
may  be  imposed,  e.g.,  by  specifying  the  order  of  the  control -generating 
differential  equation.  The  specific  form  of  the  self-imposed  constraint 
of  one  player  is  unknown  to  the  other  player  and  may  not  reasonably  be 
treated  as  a  random  variable  in  most  cases.  For  this  reason  it  is  of 
interest  to  compute  worst-case  bounds  on  the  payoff  under  varying 
assumptions  about  the  player's  strategies.  These  bounds  nay  then  be 
used  as  a  guide  to  making  engineering  decisions  about  the  utility  of  a 
particular  strategy. 
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6.3  Suggestions  for  Future  Investigations 

In  this  work  we  have  analyzed  a  linear -quadratic -Gaussian  problem 
of  a  rather  uncomplicated  type.  The  natural  extensions  of  this  work 
should  follow  the  patterns  established  by  investigators  of  one-sided 
stochastic  control  problems:  examinations  of  cases  with  plant  noise, 
colored  noise,  or  no  noise  and  cases  where  the  payoff  is  described  in 
terms  of  non -negative  definite  rather  than  positive  definite  matrices. 

Investigation  should  also  be  continued  into  the  computational 
aspects  of  the  problem.  The  indirect  approach  described  in  Chapter  h 
results  in  a  set  of  non-linear  equations  with  split  boundary  conditions 
These  equations  are  of  such  a  nature  that  when  the  control  gains  are 
fixed  the  equations  may  be  separated  into  sets  of  linear  differential 
equations  with  one-sided  boundary  conditions.  It  may  be  possible  to 
exploit  this  property  to  simplify  the  computational  problem. 

It  would  also  be  of  interest  to  investigate  the  problem  of  direct 
optimization  by  some  type  of  gradient  method  or  local  optimization 
scheme  and  to  determine  how  the  two-sided  nature  of  the  problem  affects 


convergence 
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APPENDIX 


The  general  linear-quadratic -Gaussian  stochastic  differential 
gane  functional  is  given  in  terms  of  L^,  Lg,  K^>  and  Kg  by  (2.78) 
which  is 


j(L1,Lg,K1,Kg) 


|  E^X-T^^+L^Z^)]  (A.l) 

+  TgJCgfXg+LgfZg-Zg)],  <|»X  -  T^* 


[WVS1>]  +  *srfVV*24>]> 

KkIVli(ziA)]<  Ki[Wzi-fti)]> 

*  r-A  A  1  r-A  A  > 


Differentiating  the  payoff  functional  with  respect  to  and  Lg  and 
setting  the  results  equal  to  zero,  we  have 


»  -  %  -  -X 


kiki[vli{zi'®i)]  (A,2) 


0  - 


-  KgKgfXg+LgCZg-Zg)] 


(A.3) 


,4. 


Lsals 
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These  equations  will  be  satisfied  if 

♦  K^+L^Zj^-^)]  (A.**) 

and 

O  -  T*[*X-T1K1[xi+L1(Zi-Zl)]  *  v£V*{*A0] 

-  K^$2+L2(Z2-Z2)]  (A.5) 

» 

We  interpret  these  equations  in  the  usual  manner;  i.e.,  the  right  side 

A 

of  (A. 4)  la  orthogonal  to  any  linear  transformation  of  Z^  -  Zg,  and  the 

right  aide  of  (A. 5)  is  orthogonal  to  any  linear  transformation  of 
A 

Z2  -  Z2.  We  define  the  linear  transformations 


M  -  4  -  T^  +  T^Cg 

(A.6) 

A1  -  (T*T1+I)K1 

(A.7) 

Ag  ■  (TgTg-ljKg 

(A.8) 

A  A 

and  note  that  ve  may  write  x^  -  X  -  e^,  Xg  «  X  -  «2  . 

be  written 

Thus  (A. 4)  may 

0  -  (T*M^C1)  X  -  A1[«1-L1(Z1-Z1)] 

(A.9) 

and  (A. 5)  becomes 


0  -  (igjM-Kg)  X  +  T*T1K3£e1-L1(Z1-Z1)] 

-  Ag[e2-L2(Z2-Z2)]  (A.  10 

We  may  also  differentiate  the  payoff  functional  with  respect  to 
and  Kg*  Doing  this  and  setting  the  resulting  expression  equal  to 
tero,  ve  have 

0  -  iiq-  <[^iki[vii<2A>] 

+  t^x2+l2(z2-z2)"1  +  k1[x1+l1(z1-S1)]  (a. 11 

♦  T^g[x2+L2(Z2-Z2)l  I  -  Kg[x2+L2(Z2-Z2)]  (A.12 

Again  ve  interpret  these  equations  to  mean  that  the  right  side  of 

(A. 11)  is  orthogonal  to  any  linear  transformation  of  Jj^+L^Z^-Z^l 

and  the  right  side  of  (A.12)  is  orthogonal  to  any  linear  transformation 

of  [j(2+L2(Z2-Z2)l.  We  note  that  (A. 11)  and  (A.12)  have  the  same  form 

as  (A. 4)  and  (A.5)  and  may  be  written  as  (A. 9)  and  (A. 10).  Thus  the 

right  side  of  (A. 9)  is  orthogonal  to  any  linear  transformation  of 
A  rA  ,  A  -l 

Z^  -  Zx  or  of  j,  and  these  two  relations  imply  that  the 

right-hand  side  of  (A. 9)  is  orthogonal  to  any  linear  transformation  of 
A 

X..  Analogous  statements  apply  to  (A. 10):  the  right  side  of  (A. 10) 


is  orthogonal  to  any  linear  transformations  of  or  of  Zg  - 
Because  of  the  fact  that  for  normal  random  variables  the  error  in  the 
conditional  mean  state  estimate  is  orthogonal  to  all  linear  transforma 
tions  of  the  conditional  mean  and  of  the  observations,  we  may  rewrite 
(A. 9)  and  (A. 10)  as 

0  -  (t'm-K^  +  A1L1(Z1-Z1)  +  ^TgK^Zg-Zg)]  -  Vl  (/U1- 

0  -  (TgM-KgJXg  +  ViKjtvVZjA*]  +  A2L2(Z2^2)  ’  A2€2 

(A.l* 

• 

Again,  we  recall  that  the  right  side  of  (A.13)  is  orthogonal  to  all 
linear  transformations  of  X±  or  of  ^  and  that  the  right  side  of 
(A.lU)  is  orthogonal  to  all  linear  transformations  of  Xg  and  of  Zg  -  Z, 
This  is  true  for  the  particular  transformation  of  (T*M+K1)x1  . 
Thus,  from  (A.13) 

0  -  -((T^M-K^,  ^A1L1(Z1-Z1),  (T^M-K^) 

+  ((TlVs[  VVV^]'  (TlM'Kl^l)  (A.  15 

It  is  also  true  for  the  transformation  of  -  Z^:  A^L^CZ^-Z^) . 
Therefore, 


0  f- 


*  (AjljCZ^),  a1l1(z1-z1)) 
*(TlT^2[*8"l2(Z2‘za)]'  (A.  16 


•  * 


Adding  (A. 15)  and  (A.l6),  ve  have 


0  «  -((T*M-K1)X1,  (T*M-K1)X1^  +  A1L1(Z1-^1)) 


+<t*t2k2[c2-l2(z2-z2)],  a1l1(z1-z1)  ♦ 


(A.  17 


Nov  the  second  and  third  terms  on  the  right  depend  only  on  the 


covariances  of  the  noise  and  the  initial  state,  while  the  first  term  on 
the  right  is  also  dependent  on  the  mean  of  the  initial  state.  Thus, 
for  (A. 17)  to  be  satisfied  for  all  values  of  the  initial  6tate,  ve  must 


Tx  M  -  Kx  -  0 


This  being  true,  equation  (A.13)  reduces  to 


0  -  ♦  *5Ws[«e-«*<V«8>] 


(A.18) 


(A. 19) 


Furthermore,  (A.l8),  (A. 6),  and  (A. 7)  lead  to  an  alternate  representa¬ 
tion  of  A, 


Ai  -  *3>**/J 


(A.20) 


so  that  (A. 19)  may  be  vritten 


0  ♦-  *lK*aK<zA>  +  TiT^c2-l2(z2-z2)]  (A. 2l) 


vhich  will  be  satisfied  if 

0  «  [(^+T2K2]Li(Zi-Zi)  +  T^Cgfcg-LgCZg-Zg)]  (A.  22) 

Since  (A. 27)  is  interpreted  to  mean  that  the  right  side  of  (A. 27)  is 
orthogonal  to  any  linear  transformation  of  Z^  -  ^  ,  we  must  have 


'  WAft  -  -*sM«A  (a'2 

where  -  E-[(Z1-^1)(Z1-^1)*}  ,  etc. 

A  completely  analogous  manipulation  starting  with  (A. lb)  leads  us 
first  to  the  conclusion  that  * 


T^M  -  Kg  -  0 


(A.2b) 


Thi6  then  reduces  (A. lb)  to 

0  "  T2TlKl[ei"Ll(Zl\)]  +  A2L2(Z2^2)  "  A2e2  (A,25) 


Then  (A. 2b),  (A. 6),  and  (A. 8)  give  an  alternative  expression  for  Ag 

Ag  -  -  T^-TjK-J  (A. 26) 

ao  that  (A. 25)  may  be  written 

0  -  *>!*;,[ vVzA)]  -  4KKl]h><ZA> 

+  T^-r^Sg  (A.27) 
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vhicli  will  be  satisfied  If 


0  '  tiki[vli(zA>]  -  [^iKi>2<za4> 

+  [(Jj-TjK^Cg  (A.28) 

As  before,  we  interpret  this  to  mean  that  the  right  side  of  (A.28)  is 

A 

orthogonal  to  any  linear  transformation  of  Zg  -  Zgj  hence, 

TlKi +  [*^iKa>2 Vs  *  W.jSg  (A’29) 

We  have  in  (A.l£),  (A. 23),  (A. 24),  and  (A. 29)  a*set  of  four  simul¬ 
taneous  linear  equations  describing  K^,  Kg,  L^,  and  Lg.  We  may  solve 
for  K1  and  Kg  quite  easily  from  (A.l8)  and  (A. 24).  Using  (A. 6)  and 
(A.l8),  we  have 


T*<j>  -  T*T1K1  +  T*TgKg  =  (A.30) 

Comparing  these  equations  to  (1.44)  and  (1.45),  we  see  they  are 
similar  in  form;  thus,  we  have  the  solutions 

(A.31) 

(A. 32) 


*1  -  41  +  Vi  -  V2T1  * 

*2  -  ♦  T ,T*  -  TgT*]'1  i 
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These  expressions  may  be  substituted  into  (A. 23)  and  (A. 29),  but  for 
not&tlonal  compactness  it  is  better  to  retain  the  equations  in  their 
present  form,  which  is 


[i4T2JC2TL1*z^Zi  -  =  -  T2^2^e^2 

tikili*z1z2  +  [<^"tikiJl2^z2z2  *  tiki*c1z2 


(A.23) 

(A.29) 


The  above  equations  are  necessary  conditions  which  must  be  satisfied 
by  linear  operations  on  noisy  state  observations  vhlch  make  up  part  of 
the  strategies  assumed  in  (2.38)  and  (2.39). 


